Possibilities for integrating model-related data in computational biology (DILS 2013)
1. Possibilities for Integrating
Model-related Data in Computational
Biology
Databases in Life Sciences, Montreal, July 2013
Dagmar Waltemath, University of Rostock, Germany
Nicolas Le Novère, Babraham Institute, UK
Michel Dumontier, Carleton University, Canada
Archive
7. Introduction
1. How can we distribute models with all
information necessary to reuse them (MIRIAM)?
2. How can we effectively manage different types
of model-related data?
3. How can we link model-related data to the rest
of the world?
13-07-12 Integrating model-related data 7
9. 1. Distributing models
The COMBINE archive v0.1
• single “.zip” file
• bundles models and model-related data
• single file
http://co.mbine.org/documents/archive
13-07-12 Integrating model-related data 9
10. 1. A manifest file,
"manifest.xml“,
2. all described files,
3. a metadata file,
"metadata.*“,
4. remaining files.
• All documents necessary
for the description of a
model and all associated
data and procedures.
• In the future: also
references to documents
1. Distributing models
<?xml version="1.0" encoding="utf-8"?>
<omexManifest
xmlns="http://identifiers.org/combine.specifications/omex-manifest">
<content location="./manifest.xml"
format="http://identifiers.org/combine.specifications/omex-
manifest"/>
<content location="./model/model.xml"
format="http://identifiers.org/combine.specifications/sbml"/>
<content location="./simulation.xml"
format="http://identifiers.org/combine.specifications/sedml"/>
<content location="./article.pdf"
format="application/pdf"/>
<content location="./metadata.rdf"
format="http://identifiers.org/combine.specifications/omex-
metadata"/>
</omexManifest>
13-07-12 Integrating model-related data 10
12. 2. Managing models
• Neo4J database
• Model2graph mapping ( , )
• Rich relations
http://biomodels.net/qualifiers
• Links to annotations
13-07-12 Integrating model-related data 12
“Which models are annotated with
‚Adenosine tri-phosphate‘?“
“Which models contain reactions with ATP
as reactant and ADP as product?”
Document
Model
P E CR S
SBO:0000268 uniprot:P07101 uniprot:Q03393 GO:0005737HGNC:8582
is
isVersionOf
is
isEncodedBy
is
asProduct
asReactant
asModifier
Fig.: Henkel et al. (2012) INFORMATIK 2012, Braunschweig
13. Document
Model
P E CR S
SBO:0000268 uniprot:P07101 uniprot:Q03393 GO:0005737HGNC:8582
is
isVersionOf
is
isEncodedBy
is
asProduct
asReactant
asModifier
2. Managing models
• Lucene-based ranked retrieval
13-07-12 Integrating model-related data 13
“Give me the best matching model published
about the Cell Cycle and covering forms of cdc.“
Lucene query "cdc*" AND "Cell Cycle"
http://www.ebi.ac.uk/biomodels-demo/
Henkel et al. (2010), Bioinformatics
Fig.: Henkel et al. (2012) INFORMATIK 2012, Braunschweig
14. 2. Managing models
• Representing simulation descriptions
• ... and other types of model-related data
13-07-12 Integrating model-related data 14
“Give me all possible simulations that
show the dependency of the Cell Cycle
on the concentration of cdc25.“
Fig.: Henkel et al. (2012) INFORMATIK 2012, Braunschweig
16. 3. Integrating model data
13-07-12 Integrating model-related data 1616
At the heart of Linked Data for the Life Sciences
• Free and open source
• Based on Semantic Web standards
• Billions of interlinked statements from dozens
of conventional and high value datasets
• Partnerships with EBI, NCBI, DBCLS, NCBO,
OpenPHACTS, and commercial tool providers
chemicals/drugs/formulations,
genomes/genes/proteins, domains
Interactions, complexes & pathways
BioModels
animal models and phenotypes
Disease, genetic markers, treatments
Terminologies & publications
17. 3. Integrating model data
# get all biochemical reactions in biomodels that are kinds of "protein
catabolic process“, as defined by the gene ontology (in bioportal endpoint)
SELECT ?go ?label count(distinct ?x)
WHERE {
?go rdfs:label ?label .
?go rdfs:subClassOf ?tgo OPTION (TRANSITIVE) .
?tgo rdfs:label ?tlabel .
FILTER regex(?tlabel, "^protein catabolic process")
service <http://biomodels.bio2rdf.org/sparql> {
?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go .
?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .
}
13-07-12 Integrating model-related data 17
Gene Ontology Annotation Number of Reactions
protein catabolic process [go:0030163] 51
cellular protein catabolic process [go:0044257] 26
modification-dependent protein catabolic process [go:0019941] 1
beta-amyloid formation [go:0034205] 1
“Give me all reactions in BioModels
Database that represent protein
catabolic processes. “
18. Summary
Approach Features Purpose
COMBINE archive File bundle;
• Easy access to all model-related
data through one single file
Shipping files
Graph-DB (MORRE) Network of interrelated nodes
• IR techniques easily applicable
• No schema
• Link models and simulations
Managing
existing
model data
BIO2RDF Semantic integration of knowledge
• Automated reasoning
• No schema
• Linking into LOD
Full
integration
13-07-12 Integrating model-related data 18