SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Tetherless World Constellation, RPI
Jim Hendler
Tetherless World Professor of Computer,
Web and Cognitive Sciences
Director, Institute for Data Exploration and
Applications
Rensselaer Polytechnic Institute
http://www.cs.rpi.edu/~hendler
@jahendler
Major talks at: http://www.slideshare.net/jahendler
Tetherless World Constellation, RPI
INVESTOPEDIA – Tragedy of the commons (summary)
• The tragedy of the commons is an economic problem that
results in overconsumption, under investment, and
ultimately depletion of a common-pool resource.
• For a tragedy of the commons to occur a resource must be
scarce, rivalrous in consumption, and non-excludable.
• Solutions to the tragedy of the commons include the
imposition of private property rights, government
regulation, or the development of a collective action
arrangement.
Tetherless World Constellation, RPI
Tragedy of the data commons
• The tragedy of the DATA commons is an ongoing scientific
problem that results in under-utilization, over investment,
and ultimately disuse of a common data resources.
• For a tragedy of the commons to occur a resource must be
scarce, rivalrous in consumption, and non-excludable.
• Solutions to the tragedy of the commons include the
imposition of private property rights, government
regulation, or the development of a collective action
arrangement.
– Can we move to the third option?
We want data to be FAIR
• Easy to say, connotes a lot
• Harder to operationalize
• For machines
• Formats
• Standards
• …
• For humans
• Incentives
• Trust
• Training
• …
• Need models, best practices, lessons learned, etc.
The big challenge is we require sharing across large
projects
• Example biomedical research
• Best models span disciplines
• People live in different departments at different universities
• But compelling scientific challenge forcing function for people to work together
• Created incentives
• Funding is still largely by project
• Infrastructure for project data: expensive
• Infrastructure for cross-project data sharing: priceless
• Short- to mid- term solutions likely to require interoperability between
separately funded efforts
• WHICH LEADS TO THE TRAGEDY OF THE DATA COMMONS
Organs Histopathology
Organism Phenotype
Circuits Electrophysiology
Cells In vitro phenotype
Pathways Signal Cascades
Biomodules Protein: Protein Interactions
Protein Proteome
RNA Transcriptome
DNA Genome
Population Epidemiology
Along term endeavor
Understanding a single domain
© G. Bhanavar, IBM, IJCAI ‘16
Solution requires Interoperability
• One reason the Web beat its competitors…
• Gopher
• Archie
• FTP
• …
• Provided
a lightweight
standard that
allowed interoperability between these and more
• Web was built on “coop-etition”
• How do we learn this lesson for data sharing?
Tetherless World Constellation, RPI
FAIR requires sharable metadata
But that ontology stuff never works….
• Ontology
– Hard to build
– Expensive to
maintain
– Don’t map to
people’s data
– Rarely reused
• Aren’t ontologies
why the sharing
parts of FAIR are
so hard?
That ontology stuff never works…
• Ontology
– Hard to build
– Expensive to
maintain
– Don’t map to
people’s data
– Rarely reused
• Aren’t they why
the sharing part of
FAIR is so hard?
CHEAR Ontology Effort
12
The Children’s Health Exposure Analysis Resource, or CHEAR, is a program funded
by the National Institute of Environmental Health Sciences to advance understanding
about how the environment impacts children’s health and development over the
course of a lifetime.
https://chearprogram.org/
Children’s Health Exposure
Analysis Resource (CHEAR)
McGuinness 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
CHEAR is composed of three components:
A National Exposure Assessment Laboratory Network, providing both targeted
and untargeted environmental exposure and biological response analyses in
human samples
A Data Repository, Analysis, and Science Center, providing statistical services,
a data repository, and data standards for integration and sharing
A Coordinating Center, connecting the research community to CHEAR
resources
CHEAR Ontology Effort
1
3
Goal: Encode terminology currently needed by the CHEAR Data Center
Portal, publish an open source extensible ontology integrating general
exposure science and health leveraging best in class terminologies.
Enabling Findable, Accessible, Interoperable, Reusable Data and
Services to support data analysis and interdisciplinary research
Ontologies encode terms and their interrelationships, providing a foundation
for understanding interoperability and reusability (I and R in terms of FAIR)
Ontology-enabled infrastructures - Knowledge Graphs and Ontology-
enabled search services also provide support for finding and accessing
relevant content (the F and A in FAIR)
Child Health Exposure
Analysis Resource Ontology
McGuinness 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
Stingone, Mervish, Kovatch, McGuinness, Gennings, Teitelbaum. Big and Disparate Data: Considerations for
Pediatric Consortia. Current Opinions in Pediatrics Journal. 29(2):231-239, April 2017. doi:
10.1097/MOP.0000000000000467. PMID: 28134706
Ontology Foundations
14
Imported Ontologies:
●Semantic Science Integrated
Ontology (SIO)
●PROV-O
●Units Ontology
●Human-Aware Science Ontology
(HAScO)
●Virtual Solar Terrestrial
Observatory (Instruments)
(VSTO-I)
●Environment Ontology (ENVO)
●…
Minimum Information to Reference an
External Ontology Term (MIREOT)-ed
Ontologies:
●Chemicals of Biological Interest (CheBI)
●Statistics Ontology (STAT-O)
●PubChem
●UBERON (Anatomy)
●Disease Ontology (DO)
●UniProt (Proteins)
●Cogat (Cognitive Measures)
●ExO
●RefMet, …
Annotations:
●Simple Knowledge Organization System
(SKOS)
●Dublin Core (DC) Terms
14
CHEAR Ontology
Foundations and Reuse
McGuinness 9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
McCusker, Rashid, Liang, Liu, Chastain,
Pinheiro, Stingone, McGuinness. Broad,
Interdisciplinary Science In Tela: An
Exposure and Child Health Ontology. In
Proceedings of Web Science, 2017. Troy,
NY. 349-357.
Tetherless World Constellation, RPI
Meta-data as evolving resource, not predefined standard
• Move from hard meta-data standards to sharable
resources
– Referenceable by links
• Linked data principles apply to linking of metadata
– This is a key part of the original semantic web vision
• To date still working best “within cultures” …
– Which solves the “grounding” problems
• But growing interdisciplinary links
– As more sharing occurs
Tetherless World Constellation, RPI
Lessons learned
• This was a crucial component in creating
– Usable (lightweight) semantics for Web Apps (schema.org)
– Usable (lightweight) semantics for govt data sharing (DCAT)
– Successful scientific efforts
• Virtual Observatory
• Deep carbon observatory
• Health Data Research UK (parts thereof to date)
Tetherless World Constellation, RPI
Details omitted for time
• Multiple large projects have been working with
– Provenance
– Curation and Versioning
• Archiving
– Consistency
• Only partial overlap
– Credit and citation
– Interdisciplinary term mappings
– Term reconciliation
– Computational infrastructure (esp cloud)
– Third party (bottom) data curation via learning *
– …
Tetherless World Constellation, RPI
How does this beat the data commons problem?
Total
Project
~$60M
Data Mgt
~$10M
CHEAR
ontology
~$1M
Reuse and linking
of the data via
ontological
(metadata)
development is a
fraction of the total
project cost
- but key to project
success
Tetherless World Constellation, RPI
Can this beat the data commons problem?
• Projects can share data at a
fraction of the cost if they
– Start from overlapping common
metadata terms
• Not a single standard
– Each put aside a relatively small
cost for the metadata team
• Embedded in data team
– Their metadata data teams work
together to the extent possible
• Reusing the metadata leads to
reusability of the data
Project
Data
Meta
data
Tetherless World Constellation, RPI
Questions?
https://idea.rpi.edu
Manufacturing Data Problem
– DARPA Open Manufacturing Performers (Honeywell, Lockheed Martin,
Boeing etc.) generated TBs of metal AM process, testing and
characterization data.
– Data management requirements (Materials Genome Initiative)
– Over a period of time…..DARPA’s data server looks like this
www.existentialennui.com
“Good data”
but
Little use in its current form !
Our Approach
Drill into the data filesStep 1: “Pick up the books”
Step 2: “Develop basic Dewey
decimal system”
Use domain expertise to realize
“functional ontologies” to
anchor the data sets.
Slide No.23
Our Approach
• Faceted search-based
visualization of data
• Meaningful interaction with data
Step 3: “What Type of Display Case ? ”
Our Approach
• Apply machine learning on
the data sets.
• Train & then Predict for
untested conditions.
Step 4: “Read & Discover New Knowledge”
Grand Vision: Data-driven Inverse Design for
AM Part Qualification Paradigm
Typical validation output (confusion matrix)
from a single trial. Green cells are correct
predictions. Gray cells are incorrect
predictions
Machine Learning Example
(Composites Testing Data)
• Data set (n=562) randomly partitioned into
training set (n=395) and test set (n=167). Each
trial partitions the data differently.
Objective: Classify majority failure modes (interfacial/cohesive) based on
input parameters (Surface Preparation, Contaminate Type, Contaminate
Amount)

Weitere ähnliche Inhalte

Was ist angesagt?

Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterprisePhilip Bourne
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesRothamsted Research, UK
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update James Hendler
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13Kristi Holmes
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...Natalie Stanford
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 
Building Capacity for Open Science
Building Capacity for Open ScienceBuilding Capacity for Open Science
Building Capacity for Open ScienceKaitlin Thaney
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centreJisc
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the libraryColleen DeLory
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
 

Was ist angesagt? (20)

Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital Enterprise
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use Cases
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
Building Capacity for Open Science
Building Capacity for Open ScienceBuilding Capacity for Open Science
Building Capacity for Open Science
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centre
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the library
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 

Ähnlich wie Tragedy of the (Data) Commons

Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017Vivien Bonazzi
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - finalKathy Fontaine
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewPhilip Bourne
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
WOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web ObservatoriesWOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web Observatoriesgloriakt
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data CommonsVivien Bonazzi
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?Graham Pryor
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 
SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12
SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12 SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12
SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12 ASIS&T
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Graham Pryor
Graham PryorGraham Pryor
Graham PryorEduserv
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Infrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAInfrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAResearch Data Alliance
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhilip Bourne
 
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and RealityA VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality Paul Courtney
 

Ähnlich wie Tragedy of the (Data) Commons (20)

Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
WOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web ObservatoriesWOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web Observatories
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12
SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12 SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12
SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Infrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAInfrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDA
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and RealityA VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
 

Mehr von James Hendler

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it mattersJames Hendler
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")James Hendler
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs James Hendler
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic WebJames Hendler
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...James Hendler
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep LearningJames Hendler
 
Digital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIDigital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIJames Hendler
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataJames Hendler
 
Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)James Hendler
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...James Hendler
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...James Hendler
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?James Hendler
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebJames Hendler
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's PerspectiveJames Hendler
 
Semantic Web: The Inside Story
Semantic Web: The Inside StorySemantic Web: The Inside Story
Semantic Web: The Inside StoryJames Hendler
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupJames Hendler
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 

Mehr von James Hendler (20)

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it matters
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep Learning
 
Digital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIDigital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AI
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
 
Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
 
Wither OWL
Wither OWLWither OWL
Wither OWL
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's Perspective
 
Semantic Web: The Inside Story
Semantic Web: The Inside StorySemantic Web: The Inside Story
Semantic Web: The Inside Story
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic Markup
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Tragedy of the (Data) Commons

  • 1. Tetherless World Constellation, RPI Jim Hendler Tetherless World Professor of Computer, Web and Cognitive Sciences Director, Institute for Data Exploration and Applications Rensselaer Polytechnic Institute http://www.cs.rpi.edu/~hendler @jahendler Major talks at: http://www.slideshare.net/jahendler
  • 2. Tetherless World Constellation, RPI INVESTOPEDIA – Tragedy of the commons (summary) • The tragedy of the commons is an economic problem that results in overconsumption, under investment, and ultimately depletion of a common-pool resource. • For a tragedy of the commons to occur a resource must be scarce, rivalrous in consumption, and non-excludable. • Solutions to the tragedy of the commons include the imposition of private property rights, government regulation, or the development of a collective action arrangement.
  • 3. Tetherless World Constellation, RPI Tragedy of the data commons • The tragedy of the DATA commons is an ongoing scientific problem that results in under-utilization, over investment, and ultimately disuse of a common data resources. • For a tragedy of the commons to occur a resource must be scarce, rivalrous in consumption, and non-excludable. • Solutions to the tragedy of the commons include the imposition of private property rights, government regulation, or the development of a collective action arrangement. – Can we move to the third option?
  • 4. We want data to be FAIR • Easy to say, connotes a lot • Harder to operationalize • For machines • Formats • Standards • … • For humans • Incentives • Trust • Training • … • Need models, best practices, lessons learned, etc.
  • 5. The big challenge is we require sharing across large projects • Example biomedical research • Best models span disciplines • People live in different departments at different universities • But compelling scientific challenge forcing function for people to work together • Created incentives • Funding is still largely by project • Infrastructure for project data: expensive • Infrastructure for cross-project data sharing: priceless • Short- to mid- term solutions likely to require interoperability between separately funded efforts • WHICH LEADS TO THE TRAGEDY OF THE DATA COMMONS
  • 6. Organs Histopathology Organism Phenotype Circuits Electrophysiology Cells In vitro phenotype Pathways Signal Cascades Biomodules Protein: Protein Interactions Protein Proteome RNA Transcriptome DNA Genome Population Epidemiology Along term endeavor Understanding a single domain © G. Bhanavar, IBM, IJCAI ‘16
  • 7. Solution requires Interoperability • One reason the Web beat its competitors… • Gopher • Archie • FTP • … • Provided a lightweight standard that allowed interoperability between these and more • Web was built on “coop-etition” • How do we learn this lesson for data sharing?
  • 8. Tetherless World Constellation, RPI FAIR requires sharable metadata
  • 9. But that ontology stuff never works…. • Ontology – Hard to build – Expensive to maintain – Don’t map to people’s data – Rarely reused • Aren’t ontologies why the sharing parts of FAIR are so hard?
  • 10. That ontology stuff never works… • Ontology – Hard to build – Expensive to maintain – Don’t map to people’s data – Rarely reused • Aren’t they why the sharing part of FAIR is so hard?
  • 11. CHEAR Ontology Effort 12 The Children’s Health Exposure Analysis Resource, or CHEAR, is a program funded by the National Institute of Environmental Health Sciences to advance understanding about how the environment impacts children’s health and development over the course of a lifetime. https://chearprogram.org/ Children’s Health Exposure Analysis Resource (CHEAR) McGuinness 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01 CHEAR is composed of three components: A National Exposure Assessment Laboratory Network, providing both targeted and untargeted environmental exposure and biological response analyses in human samples A Data Repository, Analysis, and Science Center, providing statistical services, a data repository, and data standards for integration and sharing A Coordinating Center, connecting the research community to CHEAR resources
  • 12. CHEAR Ontology Effort 1 3 Goal: Encode terminology currently needed by the CHEAR Data Center Portal, publish an open source extensible ontology integrating general exposure science and health leveraging best in class terminologies. Enabling Findable, Accessible, Interoperable, Reusable Data and Services to support data analysis and interdisciplinary research Ontologies encode terms and their interrelationships, providing a foundation for understanding interoperability and reusability (I and R in terms of FAIR) Ontology-enabled infrastructures - Knowledge Graphs and Ontology- enabled search services also provide support for finding and accessing relevant content (the F and A in FAIR) Child Health Exposure Analysis Resource Ontology McGuinness 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01 Stingone, Mervish, Kovatch, McGuinness, Gennings, Teitelbaum. Big and Disparate Data: Considerations for Pediatric Consortia. Current Opinions in Pediatrics Journal. 29(2):231-239, April 2017. doi: 10.1097/MOP.0000000000000467. PMID: 28134706
  • 13. Ontology Foundations 14 Imported Ontologies: ●Semantic Science Integrated Ontology (SIO) ●PROV-O ●Units Ontology ●Human-Aware Science Ontology (HAScO) ●Virtual Solar Terrestrial Observatory (Instruments) (VSTO-I) ●Environment Ontology (ENVO) ●… Minimum Information to Reference an External Ontology Term (MIREOT)-ed Ontologies: ●Chemicals of Biological Interest (CheBI) ●Statistics Ontology (STAT-O) ●PubChem ●UBERON (Anatomy) ●Disease Ontology (DO) ●UniProt (Proteins) ●Cogat (Cognitive Measures) ●ExO ●RefMet, … Annotations: ●Simple Knowledge Organization System (SKOS) ●Dublin Core (DC) Terms 14 CHEAR Ontology Foundations and Reuse McGuinness 9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01 McCusker, Rashid, Liang, Liu, Chastain, Pinheiro, Stingone, McGuinness. Broad, Interdisciplinary Science In Tela: An Exposure and Child Health Ontology. In Proceedings of Web Science, 2017. Troy, NY. 349-357.
  • 14. Tetherless World Constellation, RPI Meta-data as evolving resource, not predefined standard • Move from hard meta-data standards to sharable resources – Referenceable by links • Linked data principles apply to linking of metadata – This is a key part of the original semantic web vision • To date still working best “within cultures” … – Which solves the “grounding” problems • But growing interdisciplinary links – As more sharing occurs
  • 15. Tetherless World Constellation, RPI Lessons learned • This was a crucial component in creating – Usable (lightweight) semantics for Web Apps (schema.org) – Usable (lightweight) semantics for govt data sharing (DCAT) – Successful scientific efforts • Virtual Observatory • Deep carbon observatory • Health Data Research UK (parts thereof to date)
  • 16. Tetherless World Constellation, RPI Details omitted for time • Multiple large projects have been working with – Provenance – Curation and Versioning • Archiving – Consistency • Only partial overlap – Credit and citation – Interdisciplinary term mappings – Term reconciliation – Computational infrastructure (esp cloud) – Third party (bottom) data curation via learning * – …
  • 17. Tetherless World Constellation, RPI How does this beat the data commons problem? Total Project ~$60M Data Mgt ~$10M CHEAR ontology ~$1M Reuse and linking of the data via ontological (metadata) development is a fraction of the total project cost - but key to project success
  • 18. Tetherless World Constellation, RPI Can this beat the data commons problem? • Projects can share data at a fraction of the cost if they – Start from overlapping common metadata terms • Not a single standard – Each put aside a relatively small cost for the metadata team • Embedded in data team – Their metadata data teams work together to the extent possible • Reusing the metadata leads to reusability of the data Project Data Meta data
  • 19. Tetherless World Constellation, RPI Questions? https://idea.rpi.edu
  • 20. Manufacturing Data Problem – DARPA Open Manufacturing Performers (Honeywell, Lockheed Martin, Boeing etc.) generated TBs of metal AM process, testing and characterization data. – Data management requirements (Materials Genome Initiative) – Over a period of time…..DARPA’s data server looks like this www.existentialennui.com “Good data” but Little use in its current form !
  • 21. Our Approach Drill into the data filesStep 1: “Pick up the books” Step 2: “Develop basic Dewey decimal system” Use domain expertise to realize “functional ontologies” to anchor the data sets.
  • 22. Slide No.23 Our Approach • Faceted search-based visualization of data • Meaningful interaction with data Step 3: “What Type of Display Case ? ”
  • 23. Our Approach • Apply machine learning on the data sets. • Train & then Predict for untested conditions. Step 4: “Read & Discover New Knowledge” Grand Vision: Data-driven Inverse Design for AM Part Qualification Paradigm
  • 24. Typical validation output (confusion matrix) from a single trial. Green cells are correct predictions. Gray cells are incorrect predictions Machine Learning Example (Composites Testing Data) • Data set (n=562) randomly partitioned into training set (n=395) and test set (n=167). Each trial partitions the data differently. Objective: Classify majority failure modes (interfacial/cohesive) based on input parameters (Surface Preparation, Contaminate Type, Contaminate Amount)