Slides from presentation given on November 21, 2011, at the 4th Global COE International Symposium on Physiome and Systems Biology for Integrated Life Sciences and Predictive Medicine, in Osaka, Japan.
1. SBML and related resources
and standardization efforts
Michael Hucka
Member of the Professional Staff
Computing + Mathematical Sciences
California Institute of Technology
1
6. SBML = Systems Biology Markup Language
Format for representing computational models
• Data structures + rules for their use + serialization to XML
Neutral with respect to modeling framework
• E.g., ODE, stochastic systems, etc.
A lingua franca for software (not humans)
6
7. Basic SBML concepts are fairly simple
The reaction is central: a process occurring at a given rate
f ([A],[B],[P ],...)
na A + nb B ⇥ np P
f (...)
nc C ⇥ nd D + ne E + nf F
.
.
.
• Participants are pools of entities (species)
Models can further include:
• Other constants & variables • Unit definitions
• Compartments • Annotations
• Explicit math
• Discontinuous events
7
8. Example of a common type of model
Simulation
output
Tyson et al. (1991)
PNAS 88(1):7328–32
8
9. Signaling pathway models Fernandez et al. (2006)
DARPP-32 Is a Robust Integrator
of Dopamine and Glutamate Signals
PLoS Computational Biology
BioModels Database model
#BIOMD0000000153
Scope of SBML encompasses many types of models
9
10. Signaling pathway models Hodgkin & Huxley (1952)
A quantitative description of
Conductance-based models membrane current and its
•
application to conduction and
“Rate rules” for temporal evolution excitation in nerve
of quantitative parameters J. Physiology 117:500–544
BioModels Database model
#BIOMD0000000020
Scope of SBML encompasses many types of models
10
11. Signaling pathway models Izhikevich EM. (2003)
Simple model of spiking neurons.
Conductance-based models IEEE Trans Neural Net.
• “Rate rules” for temporal evolution
of quantitative parameters
BioModels Database model
#BIOMD0000000127
Neural models
• “Events” for discontinuous changes
in quantitative parameters
Scope of SBML encompasses many types of models
11
12. Signaling pathway models Tham et al. (2008)
A pharmacodynamic model for
Conductance-based models the time course of tumor
shrinkage by gemcitabine +
• “Rate rules” for temporal evolution
of quantitative parameters
carboplatin in non-small cell lung
cancer patients
Clin. Cancer Res. 14
Neural models BioModels Database model
• “Events” for discontinuous changes
in quantitative parameters
#BIOMD0000000234
Pharmacokinetic/dynamics models
• “Species” is not required to be a
biochemical entity
Scope of SBML encompasses many types of models
12
13. Signaling pathway models Munz et al. (2009 )
Conductance-based models When zombies attack!:
Mathematical modelling of an
• “Rate rules” for temporal evolution
of quantitative parameters
outbreak of zombie infection
Infectious Disease Modelling
Research Progress, eds.
Tchuenche et al., p. 133–150
Neural models
• “Events” for discontinuous changes
in quantitative parameters
BioModels Database model
#MODEL1008060001
Pharmacokinetic/dynamics models
• “Species” is not required to be a
biochemical entity
Infectious diseases
Scope of SBML encompasses many types of models
13
14. SBML Level 1 SBML Level 2 SBML Level 3
predefined math functions user-defined functions user-defined functions
text-string math notation MathML subset MathML subset
reserved namespaces for no reserved namespaces no reserved namespaces
annotations for annotations for annotations
no controlled annotation RDF-based controlled RDF-based controlled
scheme annotation scheme annotation scheme
no discrete events discrete events discrete events
default values defined default values defined no default values
monolithic monolithic modular
14
15. SBML Level 3: Supporting more categories of models
Package W
Package X Package Y Package Z
SBML Level 3 Core
(dependencies)
A package adds constructs & capabilities
Models declare which packages they use
• Applications tell users which packages they support
Package development can be decoupled
15
20. Goal of supporting model composition is not new
Modular SBML
CellML has always had capability
Martin Ginkel & Jörg Stelling made
MAX−PLANCK−INSTITUT
DYNAMIK KOMPLEXER
TECHNISCHER SYSTEME
MAGDEBURG
Martin Ginkel
proposals mid-2001, 2002
Max-Planck-Institute for Dynamics of complex technical Systems
Magdeburg, Germany
5th July 2002
• Influenced by ProMoT/DIVA
Jonathan Webb also made a
proposal in 2003
The Systems Biology Markup Language (SBML) [1-3] is a computer-readable format for representing models of
biochemical reaction networks. It is applicable to many subject areas:
• metabolic networks,
• cell-signaling pathways,
• genomic regulatory networks, and
• many other modelling problems in systems biology.
SBML is based on XML, a standard medium for representing and transporting data that is widely supported on the Internet
as well as in computational biology and bioinformatics.
Because SBML is completely tool-independent, it enables
Some types of model use indexed collections of objects to describe biological phenomena [7]. We have developed a proposal
for an array extension to address this requirement [8] which has the following features:
• Arrays of , , , , structures can be created. These arrays can have
any number of dimensions where the range of each dimension is determined by two MathML integer expressions.
• An object of one of these types can have an MathML expression which defines whether the object exists. This
enables the definition of sparse arrays which turn provides a mechanism for defining connection patterns among
array elements.
• Specific objects within an array can be referenced from other objects using a variant of the direct link structure
introduced by the model composition proposal. An array selector operator performs a similar function in MathML.
• Context of Bio-SPICE project
Andrew Finney made alternate
• use of multiple simulation and analysis tools in a single research project without rewriting models for each tool • Arrays can be declared in a less verbose form (the implied form) which allows the array to 'inherit' dimensions from
• publication of models in peer-reviewed journals: other researchers can download and use your model even if they use a other arrays.
different modelling environment
• survival of models: they can outlive the software used to create them, making your work still useful even if a particular • Arrays of and structures introduced by the Model Composition Proposal can be incorporated if
simulation package is no longer supported required. This would allow for example the encoding of a model of tissue represented as an array of instances of cell
SBML has been evolving since mid-2000 through the efforts of many collaborators who make up the SBML Forum. Today, submodels.
SBML is supported by over 60 software applications
In SBML Level 2 represents a pool of chemical entities all of the same single state in a specific compartment.
As SBML evolves the community creates SBML Levels. Each new level adds new features to the language. SBML Level 2 was
cannot be composed from components. Given that several groups find this representation of species limited, a
standardized in 2003. Simple software tools can use SBML Level 1, the first and most basic version of SBML. More
proposal for a multicomponent species extension to SBML has been written [9]. This proposal aims to satisfy the following
sophisticated systems can use SBML Level 2, with its enhanced capabilities. SBML Level 3 is actively being developed
requirements:
through the SBML Forum
• Relate species of the same type that are located in different compartments
proposal in 2003, kept up discussions
• Enable reactions to defined that are generalized across compartments
SBML Level 3 is being designed collaboratively by today's leading developers of open-source software for • Enable species to be defined as composed of components
computational biology. SBML Level 3 development has been divided into several modules including: • Enable reactions to be generalized to apply to sets of species states
These requirements address the near-term needs of modellers of metabolic networks and the longer-term requirements of
• Diagrams: SBML extensions to store the graphical diagrams of models that can be created in many of today's
modellers of signal transduction networks.
graphical pathway editors.
• Model Composition: SBML extensions to support the representation of models that are composed from
submodels (See Sections 'Proposals for Model Composition' and 'Model Composition Example'). The proposal described here [8] introduces a number of basic facilities that overcome some of the limitations of SBML Level 2
and provide a foundation for a representation scheme that address all the requirements for a multicomponent species
• Multicomponent Species: SBML extensions to enable the compact representation of species having multiple proposal.
possible states (e.g., due to phosphorylation) and/or configurations with other species (e.g., protein complexes). (See
section 'Requirements of a Multicomponent Species Proposal' and following sections.) The proposal introduces a new structure which represents the set of all biochemical entities of a given type
irrespective of the location of those entities. Species structures can refer to species types which enables species of the same
• Arrays: SBML data structures to permit arrays of items (such as species, compartments, and others) to be grouped type to be related together when the given species are located in different compartments. Similarly reactions can be
through 2004
and manipulated en masse. Sparse arrays will be supported and could be used as a way to describe network generalized to apply to species types instead of species. Such a reaction applies to all compartments in a model.
connection schemes. (See Section 'Array Proposal').
• Spatial Features: SBML extensions to describe the 2-D and 3-D spatial characteristics of models, including the
geometry of compartments, the diffusion properties of species, and the specification of different species The following diagrams show various cases of how a species type may be defined. Some of these
concentrations across different regions of a cell. species type structures refer to each other.
• Controlled Vocabularies: extension of SBML to enable components of a model to be labelled with terms taken from
t
biologically and computationally meaningful controlled vocabularies.
A simple species type is indivisible
To date, there have been several proposals for SBML extensions to support model composition. These come from Martin v
Ginkel (MPI Magdeburg) [4], Jonathan Webb (BBN) [5] and Andrew Finney [6]. The common idea is to support the A species type can define a number of
composition of larger models from smaller ones (submodels). Under these proposals, a model could contain: external labelled binding sites A
• Submodel definitions: Models may be contained within an SBML document or an SBML document can reference
external models.
species type
• Instances of submodels: Models may contain instances of submodels that are complete copies of the submodels. A species type instance
identifier
model can contain more than one instance of a submodel. A model consists of a hierarchy of instances of submodels.
• Links between objects: Models may contain links between objects at arbitrary positions in the instance hierarchy.
Such a link indicates that the linked objects are replaced by a single object. The links are directional; the direction x
indicates which object overloads its attribute values to create the final object. A species type is a graph of species type y v
instances connected by bonds 0 q p
SBML efforts stalled in ‘05–’06 ...
• Direct Reference links: SBML attributes that reference other objects, for example on C B A
can be replaced by elements which enable objects in arbitrary positions in the instance hierarchy to be referenced.
unoccupied bond
species type
binding site
When composing a model, it is often necessary to merge objects from different submodels. The model composition instance
proposals provide mechanism for doing this. Consider the following model, without interfaces, containing two identifier
instances each of a different submodel. In this example, we merge species g with h and i with f:
Instance A Instance B In this section we show examples of two ways in which a reaction can be defined under this proposal. The following diagram
i shows an example of the first approach. The diagram shows a simple reaction in which two entities of types t and z are
d f consumed to create an entity of type s. The internal structure of t, z and s are not relevant to the reaction.
t z s
g j +
The following diagram shows the second more complex approach in which the reactants and products of a reaction are
e defined as graphs of species instances. The diagram shows a reaction in which two entities come together to form a larger
h
molecule. The instances of species types are identified so that the transformational details of the reaction are captured.
Lucian Smith & Mike Hucka
w v w v
0
B
0 + 0
A
p 0
B A
p
Port Reaction Link Species
The following model is equivalent but has defined interfaces:
The complex reaction scheme described above is extended so that reactions can be applied to a class of species states rather
than individual species states. Without this extension, all species states and the reactions that apply to them would have to be
Instance A Instance B enumerated. A reaction can be generalized to cover all states of one or more binding sites. In the following example diagram,
i species type y has 2 binding sites C and D. This reaction shows that an entity t of type v binds to an entity s of type y
d F f irrespective of the state of the C binding site on s. The state of the C binding site on s is captured by the variable G which is
mapped from the reactants to the product.
D
g j y v y v
J
G G
C
s
D
0 + 0
A
t G
C
s
D A
t
restarted effort in ’10
H
e h Arbitrary Subgraph
E
Support for the development of SBML and associated software and activities comes from the National Human Genome
Research Institute (USA), the National Institute of General Medical Sciences (USA), the International Joint Research Program
of NEDO (Japan), the ERATO-SORST Program of the Japan Science and Technology Agency (Japan), the Ministry of
Agriculture (Japan), the Ministry of Education, Culture, Sports, Science and Technology (Japan), the BBSRC e-Science
Along with merging equivalent entities form a single object, when combining models it is useful to be able to create Initiative (UK), the DARPA IPTO Bio-Computation Program (USA), and the Air Force Office of Scientific Research (USA).
reactions that link models. The model composition proposals allow reactions to connect species in different instances
of submodels. For example, consider the following model containing a reaction between two ports:
[1] M. Hucka et al., The systems biology markup language (SBML): a medium for representation and exchange of biochemical network
Instance X Instance Y models, Bioinformatics, Vol 19, 524-531
[2] A. Finney and M. Hucka, Systems Biology Markup Language: Level 2 and Beyond, Biochem. Soc. Trans., Vol 31, 1472-1473
[3] M. Hucka et al., Evolving a Lingua Franca and Associated Software Infrastructure for Computational Systems Biology: The Systems
a b c Biology Markup Language (SBML) Project, Systems Biology, Vol 1, 41-53
P Q d
[4] M. Ginkel, Modular SBML, Proposal for an Extension of SBML towards level 2 Proceedings of the 5th Workshop on Software Platforms for
Systems Biology, http://sbml.org/workshops/fifth/sbml-modular.pdf
[5] J. Webb, BioSpice MDL Model Composition and Libraries http://bio.bbn.com/biospice/mdl/design/compose.html
[6] A. Finney, Systems Biology Markup Language (SBML) Level 3 Proposal: Model Composition Features
http://www.cds.caltech.edu/~afinney/model-composition.pdf
[7] H. Jˆnnson et al., Signalling in multicellular models of plant development, Proceedings of the 3rd International Conference on
Systems Biology
[8] A. Finney, Systems Biology Markup Language (SBML) Level 3 Proposal: Array Features, http://www.cds.caltech.edu/~afinney/arrays.pdf
[9] A. Finney, Systems Biology Markup Language (SBML) Level 3 Proposal: Multicomponent Species Features,
http://www.cds.caltech.edu/~afinney/multi-component-species.pdf
18
21. Composition as it is currently envisioned
Goals:
• Separate concepts of model definition vs instantiation of the model
- Can define single model definition & instantiate multiple copies
- Can create model libraries
• Selective replacement and/or deletion of entities
• Optional explicit interfaces (“ports”)
Latest proposal:
• http://www.sbml.org/Community/Wiki
• Preliminary implementation for libSBML is nearly ready
19
22. Scenario #1
File “X”
Single submodel template
<sbml> Model definition “A” instantiated multiple times in
the enclosing model
<model>
Submodel “B”
Pointer to def. “A”
Submodel “C”
Pointer to def. “A”
20
23. Scenario #2
File “X”
Arbitrary nesting—model
<sbml> Model definition “C” instantiates another model
definition that itself
instantiates another model
definition
Model definition “B”
Submodel “A”
Pointer to def. “C”
<model>
Submodel “Z”
Pointer to def. “B”
21
24. Scenario #3
File “Y”
<model>
File “X”
<sbml>
External model definition “B”
Models in external files
<model>
Submodel “Z”
Pointer to def. “B”
22
25. Links/references/replacements
Model “outer”
Model “inner”
S1 S2
Compartment “c” X1 X2
Compartment “q”
Implied model
Model “outer”
S1 S2 X2 (from “inner”)
Compartment “c”
23
28. The problem
Core SBML only supports compartments containing well-stirred mixtures
• Lack support for defining geometric shape of compartments
• Lack support for nonuniform molecular distributions
• Lack support for expressing diffusion processes
The only way to do it portably in SBML is to fake it
• E.g., define a large number of small compartments...
26
29. The current proposal
Main components:
• Coordinate systems
• Patches of spatial geometries, called domains
- Domain = contiguous patch of volumetric space or surface patch
• Mapping of SBML compartments, species, & parameters to domains
• Molecular transport mechanisms (e.g., advection, diffusion)
• Mapping of molecular transport mechanisms to domains
Developed & implemented by Jim Schaff of the Virtual Cell group
• (Incomplete) proposal doc at http://www.sbml.org/Community/Wiki
• Beta test implementation for libSBML available today
27
34. Where to learn more: SBML.org—the SBML portal
Find SBML software
31
35. Where to find curated, ready-to-run models
BioModels Database
http://biomodels.net/biomodels
32
36. Features of BioModels Database
Stores & serves quantitative models of biological interest
• Free, public resource
• Models must be described in peer-reviewed publication(s)
All models are curated by hand to reproduce published results
Imports & exports models in several formats
• SBML, CellML, SciLab, XPP, BioPAX
Today: 750+ models
Developed by Nicolas Le Novère’s group (EBI), funded by EBI & NIH
33
38. Model Procedures Results
Representation
format SBRML
Minimal info
?
requirements
Semantics—
Mathematical
Biological
annotations annotations annotations
35
39. Model Procedures Results
Representation
format SBRML
Minimal info
?
requirements
Semantics—
Mathematical
Biological
annotations annotations annotations
35
40. Annotations add semantics and connections
Annotations can answer questions:
• “What other identities (synonyms) does this entity have?”
• “What exactly is the process represented by equation ‘r17’?”
• “What role does constant ‘k3’ play in equation ‘r17’?”
• “What organism are we talking about?”
• ... etc. ...
Multiple annotations on same entity are common
36
41. Le Novère et al., Nature Biotech., 23(12), 2005.
37
42. Element in Entity elsewhere
the model (e.g., in a database)
relationship qualifier
(optional)
MIRIAM cross-references are simple triples
38
46. Element in Entity elsewhere
the model (e.g., in a database)
relationship qualifier
(optional)
MIRIAM cross-references are simple triples
{ Data source
identifier
Data item
identifier
Annotation
qualifier }
(Required) (Required) (Optional)
Format:
URI chosen from Syntax & value space Controlled
agreed-upon list depends on data type vocabulary term
41
49. New development: identifiers.org
Provides resolvable persistent URIs
• Unlike URNs, you can type it in a web browser
Implemented as additional layer on top of MIRIAM Registry
• Provides persistent URLs to data sources
• References data are kept in MIRIAM Registry
Example:
• EC Code entry #1.1.1.1
- MIRIAM URN:
urn:miriam:ec-‐code:1.1.1
- identifiers.org URI:
http://identifiers.org/ec-‐code/1.1.1.1
Developed by Nicolas Le Novère, Camille Laibe, Nick Juty @ EBI
43
50. Model representation level
Concept due to Nicolas Le Novère
Graphical
Dis Biological
Co cre
nti te
nuo sto Mathematical
us cha
lum stic at ion
ped ent re ion
itie lc tat
Me par s de an no
an Sta
te am Mo l sis
fiel
da tra ete ode n aly lts
ppr nsi r M
de
la esu
oxi tio
n Mo rical r
ma
tio me
n Nu
Model type Model life-cycle
Other forms of representation
44
51. Graphical representation of models
Today: broad variation in graphical notation used in biological diagrams
• Between authors, between journals, even people in same group
However, standard notations (as used in engineering) would offer benefits:
• Consistency = easier to read diagrams with less ambiguity
• Software support: verification of correctness, translation to math
45
52. SBGN = Systems Biology Graphical Notation
Goal: standardize the graphical notation in diagrams of biological processes
• Community-based development, à la SBML
Many groups participating
• Proceeding in “levels”
• 23 software tools so far
http://sbgn.org
46
53. Agencies to thank for supporting SBML & BioModels.net
National Institute of General Medical Sciences (USA)
European Molecular Biology Laboratory (EMBL)
ELIXIR (UK)
Beckman Institute, Caltech (USA)
Keio University (Japan)
JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)
JST ERATO-SORST Program (Japan)
International Joint Research Program of NEDO (Japan)
Japanese Ministry of Agriculture
Japanese Ministry of Educ., Culture, Sports, Science and Tech.
BBSRC (UK)
National Science Foundation (USA)
DARPA IPTO Bio-SPICE Bio-Computation Program (USA)
Air Force Office of Scientific Research (USA)
STRI, University of Hertfordshire (UK)
Molecular Sciences Institute (USA)
47
54. People on SBML Team & BioModels Team
SBML Team BioModels.net Team
Michael Hucka Nicolas Le Novère
Sarah Keating Camille Laibe
Frank Bergmann Nicolas Rodriguez
Lucian Smith Nick Juty
Nicolas Rodriguez Vijayalakshmi Chelliah
Linda Taddeo Stuart Moodie
Akiya Joukarou Sarah Keating
Visionaries
Akira Funahashi Maciej Swat
Hiroaki Kitano
Kimberley Begley Lukas Endler
John Doyle
Bruce Shapiro Chen Li
Andrew Finney Harish Dharuri
Ben Bornstein Lu Li
Ben Kovitz Enuo He
Hamid Bolouri Mélanie Courtot
Herbert Sauro Alexander Broicher
Jo Matthews Arnaud Henry
Maria Schilstra Marco Donizelli
48
55. Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010
A huge thank you to the community
49