SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Standardized biological
knowledge graphs: The
BioLink Model
Chris Mungall
2018-04-13
Challenge: making representations of biological
knowledge interoperable
OMIM
MGI
HGNC
FlyBase
ClinVar
CTD
DrugBank
UniProt
BGeeDb
GO
SGD
RGD PomBase Monarch
WormBasePharmGKB
Reactome
GWAS
catalog
CHEMBL
ENSEMBL
DrugBank
BioGrid
KEGG
Panther
ZFIN
Xen
Base
Animal
QTLdb
What do we mean by knowledge here?
● Data, sensu lato : collection of values in some organized form
○ Data, sensu stricto: Output of a data collection process
■ Instrumentation or observation; raw or processed; not altered by curation
■ Serves role as evidence
■ E.g. read count in RNAseq experiment OR examination of KO mouse
○ Metadata: Data about data (or more typically) datasets
■ May be curated at source, post-hoc, manually or automatically
■ E.g. details about an RNAseq experiment (factors, instrumentation, sample prep)
○ Knowledge: Propositional assertions inferred from data
■ Something you need evidence for
■ E.g.
● gene G is expressed in tissue T under condition C
● Knocking out G gives rise to phenotype P with high penetrance
● Many bio-”databases” are actually “knowledge bases” (by this definition)
● Usual caveats:
○ Other definitions available, divisions can be murky, this is a guide rather than dogma, etc
Solution: Standard schema/datamodel for all of
biology?
Haven’t we been here before?
http://www.mged.org/Meetings/presentations/OMG/sld019.htm
Haven’t we been here before?
http://www.mged.org/Meetings/presentations/OMG/sld019.htm
Complexity and fluidity of biological knowledge vs
schema rigidity
// hypothetical strawman schema
class Gene {
String: name
String: function
String: phenotype
Protein: product
Int: start
Int: end
String: chromosome
}
Bad assumption:
- Genes actually have multiple functions
- String representation rather than vocab
Bad assumption:
- Different builds?
- Should be inherited from generic
seq feature
Bad assumption:
- Genes can have multiple products
- Products not necessarily genes
- What about transcript, exon, ...
}
The backwards evolution of schema languages
● 80s: ER, SQL DDL
○ Basis in FOL, formal algebra/calculus
● 90s: OO, UML, Description Logics
○ Rich polymorphism
● 00s: XML, SOAP
○ Can’t even...
● 10s: JSON and JSON-Schema
○ No polymorphism
○ Limited typing
○ Tree-based
○ Geared towards web-apps, not rich modeling
What works: Open-ended knowledge representation
using RDF Graphs plus OWL
● RDF: minimal
representation
model for
representing simple
facts as edges
● OWL: encodes
semantics about
RDF graphs
Success of OWL:
Bio-Ontologies
● One datamodel (OWL),
covers rich variety of
interconnected biology
● APIs, SPARQL, ...
http://obofoundry.org/ontology/uberon.html
Analogous approach in biological databases
● GMOD Chado
● Graph-like database layered
over RDBMS
● Allowed flexibility and
extensibility
● Large uptake by small MODs
Mungall, C. J., Emmert, D. B., et al. (2007) A. Bioinformatics, 23(13),
i337-346. http://doi.org/10.1093/bioinformatics/btm189
https://github.com/GMOD/Chado
Knowledge Graphs, the most pluripotent representation of data, are no longer as exotic or
experimental as they were 10 years ago. Goofaceamazonlink etc are all using them to some degree.
Challenge: too much flexibility
● With flexible schema-free graph-based
representations, multiple ways of modeling
things
● OWL provides semantic open-world
biological constraints
○ All genes are located_on exactly 1 chromosome
● Software often needs more rigid closed-
world information model constraints
○ Information System A: gene can be located on
multiple contigs/scaffolds
○ Information System B: locational info not relevant
BioLink Model Approach
● Define a powerful underlying metamodel
○ Mix aspects of closed-world UML and open-world OWL
○ Build for extensibility
○ Define exports: UML, SQL DDL, GraphQL, Json-Schema, Java, ...
● Define core biological types (E)
○ Gene, disease, anatomical entity, disease, ...
○ Cede detailed typology to ontologies
● Define core properties (R)
○ Id, name, synonym
○ Part-of, interacts-with, gives-rise-to
● Define taxonomy of relationships (extension of R)
○ Gene-gene-interaction, gene-tissue-expression
● Extensibility through use-case specific profiles
https://biolink.github.io/biolink-model
Browsing the model
● YAML source
● Autogenerated website docs: https://biolink.github.io/biolink-model
● OWL export
○ Protege
○ Bioportal
● JSON-Schema (lossy unless working in JSON-LD)
● GraphQL (lossy)
● UML Diagrams (lossy)
https://biolink.github.io/biolink-model
https://github.com/biolink/biolink-model/blob/master/biolink-model.yaml
https://github.com/biolink/biolink-model/tree/master/ontology
https://bioportal.bioontology.org/ontologies/BLM/
https://biolink.github.io/biolink-model/
Entities
Relationships
Aka assertions, facts, propositions, reified triples, edges ...
Profiles
● Different projects require different views of the data
○ E.g. omission/inclusion of different fields
○ Denormalizations
○ Inlining vs referencing
● Metamodel supports remixing and mixins
● One core conceptual model
● Different serializations for different profiles
● Well-defined transforms
● Caveat: this part is not well documented yet
How do I use it? How do I get data?
● Data model is serialization neutral
○ Plus: Flexible
○ Negative: Additional layer of abstraction
● RDF/Turtle serialization
○ http://data.monarchinitiative.org/ttl/
○ Turtle conforms to association patterns
● Property graphs
○ http://neo4j.monarchinitiative.org/
● JSON
○ Challenge: lack of polymorphism
○ Available via generic model or specific models
○ API http://api.monarchinitiative.org/api/
○ Preview: https://data.monarchinitiative.org/json/
○ BDBags of JSON coming soon
What NOT to use the biolink-model for
● Raw data
● Metadata about a dataset
● ..
● However..
○ Underlying metamodel may be useful in providing flexible representations of these
○ Currently aligning with FHIR metamodel
How does this relate to KC7?
● One view: DC is about data sensu stricto, and metadata
○ Search = lightweight ontology (syns + subsumption) + metadata datamodels
○ “Knowledge bases” have their own specialized search interfaces developed by specialists
○ No role for a standard KM in DC
● Counterview
○ We’re not trying to compete with bio-KBs
○ We want to leverage knowledge to enhance data search
■ Analogous to how google KG enhances google search
○ Example:
■ Find TopMed studies relevant to my disease
● Exploit KG linkages between disease-phenotype, phenotype-variable, phenotype-
gene

Weitere ähnliche Inhalte

Was ist angesagt?

A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity ResolutionBenjamin Bengfort
 
LOD (linked open data) part 2 lod 구축과 현황
LOD (linked open data) part 2   lod 구축과 현황LOD (linked open data) part 2   lod 구축과 현황
LOD (linked open data) part 2 lod 구축과 현황LiST Inc
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityJoshua Shinavier
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataGregg Kellogg
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Neo4j
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQLOlaf Hartig
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 

Was ist angesagt? (20)

ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
LOD (linked open data) part 2 lod 구축과 현황
LOD (linked open data) part 2   lod 구축과 현황LOD (linked open data) part 2   lod 구축과 현황
LOD (linked open data) part 2 lod 구축과 현황
 
ShEx vs SHACL
ShEx vs SHACLShEx vs SHACL
ShEx vs SHACL
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked Data
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQL
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 

Ähnlich wie Introduction to the BioLink datamodel

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxChris Mungall
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesChris Mungall
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
Exploring Large Chemical Data Sets
Exploring Large Chemical Data SetsExploring Large Chemical Data Sets
Exploring Large Chemical Data Setskylelutz
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxChris Mungall
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryMarcus Hanwell
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseJustin Clark-Casey
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeChris Mungall
 
Avogadro 2 and Open Chemistry
Avogadro 2 and Open ChemistryAvogadro 2 and Open Chemistry
Avogadro 2 and Open ChemistryMarcus Hanwell
 
Experiences with logic programming in bioinformatics
Experiences with logic programming in bioinformaticsExperiences with logic programming in bioinformatics
Experiences with logic programming in bioinformaticsChris Mungall
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DataeXascale Infolab
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DSRoopesh Kohad
 
Graph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBGraph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBAndrei KUCHARAVY
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Timothy Danford
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerRothamsted Research, UK
 

Ähnlich wie Introduction to the BioLink datamodel (20)

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciences
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Exploring Large Chemical Data Sets
Exploring Large Chemical Data SetsExploring Large Chemical Data Sets
Exploring Large Chemical Data Sets
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
 
Chado introduction
Chado introductionChado introduction
Chado introduction
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
 
Avogadro 2 and Open Chemistry
Avogadro 2 and Open ChemistryAvogadro 2 and Open Chemistry
Avogadro 2 and Open Chemistry
 
Experiences with logic programming in bioinformatics
Experiences with logic programming in bioinformaticsExperiences with logic programming in bioinformatics
Experiences with logic programming in bioinformatics
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Graph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBGraph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDB
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
 

Mehr von Chris Mungall

LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)Chris Mungall
 
LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupChris Mungall
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in UberonChris Mungall
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)Chris Mungall
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Chris Mungall
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...Chris Mungall
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributionsChris Mungall
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesChris Mungall
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyChris Mungall
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyChris Mungall
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Chris Mungall
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataChris Mungall
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesChris Mungall
 
Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshopChris Mungall
 
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013Chris Mungall
 

Mehr von Chris Mungall (20)

LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
 
LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite Group
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributions
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation Ontology
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
ENVO GSC 2015
ENVO GSC 2015ENVO GSC 2015
ENVO GSC 2015
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 
Kboom phenoday-2016
Kboom phenoday-2016Kboom phenoday-2016
Kboom phenoday-2016
 
BioMake PAG 2017
BioMake PAG 2017 BioMake PAG 2017
BioMake PAG 2017
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and Diabetes
 
Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshop
 
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
 

Kürzlich hochgeladen

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 

Kürzlich hochgeladen (20)

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Introduction to the BioLink datamodel

  • 1. Standardized biological knowledge graphs: The BioLink Model Chris Mungall 2018-04-13
  • 2. Challenge: making representations of biological knowledge interoperable OMIM MGI HGNC FlyBase ClinVar CTD DrugBank UniProt BGeeDb GO SGD RGD PomBase Monarch WormBasePharmGKB Reactome GWAS catalog CHEMBL ENSEMBL DrugBank BioGrid KEGG Panther ZFIN Xen Base Animal QTLdb
  • 3. What do we mean by knowledge here? ● Data, sensu lato : collection of values in some organized form ○ Data, sensu stricto: Output of a data collection process ■ Instrumentation or observation; raw or processed; not altered by curation ■ Serves role as evidence ■ E.g. read count in RNAseq experiment OR examination of KO mouse ○ Metadata: Data about data (or more typically) datasets ■ May be curated at source, post-hoc, manually or automatically ■ E.g. details about an RNAseq experiment (factors, instrumentation, sample prep) ○ Knowledge: Propositional assertions inferred from data ■ Something you need evidence for ■ E.g. ● gene G is expressed in tissue T under condition C ● Knocking out G gives rise to phenotype P with high penetrance ● Many bio-”databases” are actually “knowledge bases” (by this definition) ● Usual caveats: ○ Other definitions available, divisions can be murky, this is a guide rather than dogma, etc
  • 5. Haven’t we been here before? http://www.mged.org/Meetings/presentations/OMG/sld019.htm
  • 6. Haven’t we been here before? http://www.mged.org/Meetings/presentations/OMG/sld019.htm
  • 7. Complexity and fluidity of biological knowledge vs schema rigidity // hypothetical strawman schema class Gene { String: name String: function String: phenotype Protein: product Int: start Int: end String: chromosome } Bad assumption: - Genes actually have multiple functions - String representation rather than vocab Bad assumption: - Different builds? - Should be inherited from generic seq feature Bad assumption: - Genes can have multiple products - Products not necessarily genes - What about transcript, exon, ... }
  • 8. The backwards evolution of schema languages ● 80s: ER, SQL DDL ○ Basis in FOL, formal algebra/calculus ● 90s: OO, UML, Description Logics ○ Rich polymorphism ● 00s: XML, SOAP ○ Can’t even... ● 10s: JSON and JSON-Schema ○ No polymorphism ○ Limited typing ○ Tree-based ○ Geared towards web-apps, not rich modeling
  • 9. What works: Open-ended knowledge representation using RDF Graphs plus OWL ● RDF: minimal representation model for representing simple facts as edges ● OWL: encodes semantics about RDF graphs
  • 10. Success of OWL: Bio-Ontologies ● One datamodel (OWL), covers rich variety of interconnected biology ● APIs, SPARQL, ... http://obofoundry.org/ontology/uberon.html
  • 11. Analogous approach in biological databases ● GMOD Chado ● Graph-like database layered over RDBMS ● Allowed flexibility and extensibility ● Large uptake by small MODs Mungall, C. J., Emmert, D. B., et al. (2007) A. Bioinformatics, 23(13), i337-346. http://doi.org/10.1093/bioinformatics/btm189 https://github.com/GMOD/Chado
  • 12. Knowledge Graphs, the most pluripotent representation of data, are no longer as exotic or experimental as they were 10 years ago. Goofaceamazonlink etc are all using them to some degree.
  • 13. Challenge: too much flexibility ● With flexible schema-free graph-based representations, multiple ways of modeling things ● OWL provides semantic open-world biological constraints ○ All genes are located_on exactly 1 chromosome ● Software often needs more rigid closed- world information model constraints ○ Information System A: gene can be located on multiple contigs/scaffolds ○ Information System B: locational info not relevant
  • 14. BioLink Model Approach ● Define a powerful underlying metamodel ○ Mix aspects of closed-world UML and open-world OWL ○ Build for extensibility ○ Define exports: UML, SQL DDL, GraphQL, Json-Schema, Java, ... ● Define core biological types (E) ○ Gene, disease, anatomical entity, disease, ... ○ Cede detailed typology to ontologies ● Define core properties (R) ○ Id, name, synonym ○ Part-of, interacts-with, gives-rise-to ● Define taxonomy of relationships (extension of R) ○ Gene-gene-interaction, gene-tissue-expression ● Extensibility through use-case specific profiles https://biolink.github.io/biolink-model
  • 15. Browsing the model ● YAML source ● Autogenerated website docs: https://biolink.github.io/biolink-model ● OWL export ○ Protege ○ Bioportal ● JSON-Schema (lossy unless working in JSON-LD) ● GraphQL (lossy) ● UML Diagrams (lossy) https://biolink.github.io/biolink-model
  • 21. Relationships Aka assertions, facts, propositions, reified triples, edges ...
  • 22. Profiles ● Different projects require different views of the data ○ E.g. omission/inclusion of different fields ○ Denormalizations ○ Inlining vs referencing ● Metamodel supports remixing and mixins ● One core conceptual model ● Different serializations for different profiles ● Well-defined transforms ● Caveat: this part is not well documented yet
  • 23. How do I use it? How do I get data? ● Data model is serialization neutral ○ Plus: Flexible ○ Negative: Additional layer of abstraction ● RDF/Turtle serialization ○ http://data.monarchinitiative.org/ttl/ ○ Turtle conforms to association patterns ● Property graphs ○ http://neo4j.monarchinitiative.org/ ● JSON ○ Challenge: lack of polymorphism ○ Available via generic model or specific models ○ API http://api.monarchinitiative.org/api/ ○ Preview: https://data.monarchinitiative.org/json/ ○ BDBags of JSON coming soon
  • 24. What NOT to use the biolink-model for ● Raw data ● Metadata about a dataset ● .. ● However.. ○ Underlying metamodel may be useful in providing flexible representations of these ○ Currently aligning with FHIR metamodel
  • 25. How does this relate to KC7? ● One view: DC is about data sensu stricto, and metadata ○ Search = lightweight ontology (syns + subsumption) + metadata datamodels ○ “Knowledge bases” have their own specialized search interfaces developed by specialists ○ No role for a standard KM in DC ● Counterview ○ We’re not trying to compete with bio-KBs ○ We want to leverage knowledge to enhance data search ■ Analogous to how google KG enhances google search ○ Example: ■ Find TopMed studies relevant to my disease ● Exploit KG linkages between disease-phenotype, phenotype-variable, phenotype- gene