Invited presentation given at the Whole-Cell Modeling Summer School, held in Rostock, Germany, March 2015.
https://sites.google.com/site/vwwholecellsummerschool/important-dates/programm
A summary of various COMBINE standardization activities
1. A summary of various COMBINE
standardization activities
Michael Hucka, Ph.D.
Department of Computing and Mathematical Sciences
California Institute of Technology
Pasadena, CA, USA
Whole-cell modeling summer school, Rostock, Germany, March 2015
8. Developing standards is not easy
Need bring together people with diverse set of knowledge & skills
• Scientific needs
• Technical implementation skills
• Practical experience
Need manage multiple phases
• Creation
• Evolution
• Support
9. Developing standards is not easy
Need bring together people with diverse set of knowledge & skills
• Scientific needs
• Technical implementation skills
• Practical experience
Need manage multiple phases
• Creation
• Evolution
• Support
} This is just for the specification of the
standards, to say nothing of the necessary
software and other infrastructure!
10. Very recent example of how doing a poor job can be costly
Claim: published FBA results are incorrect.
Reality: poor file format led to misinterpretation.
12. Realizations about the state of affairs in late-2000’s
• Many efforts overlapped, but lacked coordination
• Efforts were inventing their own processes from scratch
• Many individual meetings meant more travel for many people
• Limited and fragile funding didn’t support solid, coherent base
COMBINE = Computational Modeling in Biology Network
• Coordinate meetings
• Coordinate standards development
• Develop common procedures & tools
• Provide a recognized voice
Motivations for the creation of COMBINE
15. Example of practical infrastructure provided by COMBINE
Common URI scheme for specification documents
!
!
!
!
!
!
E.g.: http://identifiers.org/combine.specifications/sbgn.er.level-1.version-1
• Resolves to a page that lists where the specification can be found
- Actual document can be stored anywhere
• Hierarchical structure
18. Communication of models in the olden days…
Describe the models and equations in a paper, and you’re done, right?
Problems:
• Errors in printing
• Missing information
• Dependencies on
implementation
• Outright errors
• Can be a huge
effort to recreate
Solution: use a standard
electronic format
19.
20. Format for representing models of biological processes
• Data structures + principles + serialization to XML
• (Mostly) Declarative, not procedural—not a scripting language
(Mostly) neutral with respect to modeling framework
• E.g., ODE, stochastic systems, etc.
Does not store experimental data, or simulation descriptions
• But software may write their own metadata (annotations) in SBML
For software to read/write, not humans
SBML = Systems Biology Markup Language
22. SBML is a file format based on XML
Software shields you from working with this directly
23. The process is central
• Literally called“reaction”(not necessarily biochemical)
• Participants are pools of entities of the same kind (“species”)
!
!
!
!
• Species are located in containers (“compartments”)
- Core SBML assumes well-mixed compartments
Models can further include:
• Other constants & variables
• Discontinuous events
Core SBML concepts are fairly simple
• Unit definitions
• Annotations
• Other, explicit math
na1 A nb1 B+ nc1 C
f1(...)
na2 A nd2 D+ ne1 E
f2(...)
...
nc3 C nf3 F
f3(...)
+ ng3 G
24. Core SBML constructs support many types of models
!
ODE (e.g., cell differentiation)
Conductance-based (e.g., Hodgin-Huxley)
Typically do not use SBML “reaction” construct,
but instead use “rate rules” construct
Neural (e.g., spiking neurons)
Typically use “events” for discontinuous changes
Pharmacokinetic/dynamics
“Species”arenotrequiredtobebiochemicalentities
Infectious diseases BioModels Database model
#MODEL1008060001
BioModels Database model
#BIOMD0000000451
BioModels Database model
#BIOMD0000000020
BioModels Database model
#BIOMD0000000127
BioModels Database model
#BIOMD0000000234
Example of model type Example model
List originally by Nicolas Le Novére
25. Community-based development process
Defines process for
• Proposing changes and
additions to SBML and
SBML packages
• Developing specifications
• Voting
• The roles of editors
!
Small changes
forthcoming in package
requirements and procedures
26. Frank Bergmann
Andreas Dräger
Michael Hucka
Brett Olivier
Sven Sahle
DagmarWaltemath
SBML Editors
Stefan Hoops
Sarah Keating
Nicolas Le Novère
Chris Myers
James Schaff
Lucian Smith
DarrenWilkinson
PastCurrent
(chair)
27. SBML Level 1 SBML Level 2 SBML Level 3
predefined math functions user-defined functions user-defined functions
text-string math notation MathML subset MathML subset
reserved namespaces for
annotations
no reserved namespaces
for annotations
no reserved namespaces
for annotations
no controlled annotation
scheme
RDF-based controlled
annotation scheme
RDF-based controlled
annotation scheme
no discrete events discrete events discrete events
default values defined default values defined no default values
monolithic monolithic modular & extensible
29. SBML Level 3 What it
Hierarchical model composition Models containing submodels ✔
Flux balance constraints Constraint-based models ✔
Qualitative models Petri net models, Boolean models ✔
Graph layout Diagrams of models ✔
Multicomponent/state species Entities w/ structure; also rule-based models draft
Spatial Nonhomogeneous spatial models draft
Graph rendering Diagrams of models draft
Groups Arbitrary grouping of components draft
Arrays Arrays of entities draft
Dynamic structures Creation & destruction of components draft
Distributions Numerical values as statistical distributions draft
Annotations Richer annotation syntax
Status
30. SBML Level 3‘comp’: Hierarchical Model Composition
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“A”
Core SBML
31. SBML Level 3‘comp’: Hierarchical Model Composition
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“A”
Core SBML
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“A”
With hierarchical model composition
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“B”
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“C”
32. The‘comp’package supports multiple arrangements
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“A”
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“B”
Separate files (possibly in databases)
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“C”
Model“C”
Model“D”
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“D”
Model“B”
(Think of
libraries of
tested
models.)
33. Results can be“flattened”to plain SBML Level 3 Core
Facility implemented in libSBML
Allows tools to read L3 +‘comp’models as if they were just plain L3
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“A”
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“B”
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“C”
Model“C”
Model“D”
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“D”
Model“B”
Species ...
Compartments ...
Parameters ...
Reactions ...
Model“A”
SBML Level 3 CoreOriginal SBML Level 3 Core + SBML‘comp’
34. Layer on top of SBML Level 3 Core to define syntax for constraint-based
(e.g., flux-balance analysis) models
Nicknamed‘fbc’
Developed by Brett Olivier and
Frank Bergmann
SBML Level 3‘fbc’: Flux Balance Constraints
35. Layer on top of SBML Level 3 Core to define syntax for constraint-based
(e.g., flux-balance analysis) models
Nicknamed‘fbc’
Developed by Brett Olivier and
Frank Bergmann
SBML Level 3‘fbc’: Flux Balance Constraints
New version was just agreed upon last week!
New draft specification is under development.
36. SBML Level 3‘distrib’: Distributions
Layer on top of SBML Level 3 Core to provide mechanisms for:
1. Sampling a random value from a probability distribution
2. Describe uncertainty about values of elements
Uses UncertML 3.0 for the definition of various distribution functions
38. Need to capture the processes applied to models
?
BIOMD0000000319 in BioModels Database
Decroly & Goldbeter, PNAS, 1982
39. Format to capture procedures, algorithms, parameter values
• Neutral format for encoding the steps to go from model to output
Can be used for
• Simulation experiments encoding parametrizations & perturbations
• Simulations using more than one model and/or method
• Data manipulations to produce plot(s)
!
SED-ML = Simulation Experiment Description ML
40. Software apps & libraries available for SED-ML Level 1 v.1
Some SED-ML-compatible software today:
• libSedML
• jlibsedml
• SBW Simulation Tool
• CellDesigner
• Web tools
• others
http://sedml.org
42. Graphical representation of models
Today: broad variation in graphical notation used in biological diagrams
• Between authors, between journals, even people in same group
However, standard notations would offer benefits:
• Consistency = easier to read diagrams with less ambiguity
• Software support: verification of correctness, translation to math
43. SBGN = Systems Biology Graphical Notation
Goal: standardize the graphical notation in diagrams of biological processes
3 sublanguages to describe different facets of a model
• Process Diagram: causal sequences of processes & their results
- A node represents a given state of an entity
• Entity Relationship: interactions bet. entities regardless of sequence
- A node represents an entity regardless of state
• Activity Flow: information flowing from one entity to another
- Hybrid — shows flow of activity without state transitions
Languages reuse same symbols, but their interpretations are different
44. SBGN support today
Being used in publications
Numerous software tools and databases
• API libraries are under development
See http://sbgn.org for more
Martin et al., Autophagy, Jan. 2013
Reactome Database — http://reactome.org
46. Huge thanks to everyone in the COMBINE community
Attendees of COMBINE 2013, Paris, France
Attendees of COMBINE 2014, Los Angeles, California, USA
47. NationalInstituteofGeneralMedicalSciences(USA)
Air Force Office of Scientific Research (USA)
BBSRC (UK)
Beckman Institute, Caltech (USA)
DARPA IPTO Bio-SPICE Bio-Computation Program (USA)
Drug Disease Model Resources (EU-EFPIA Innovative Medicine Initiate)
ELIXIR (UK)
European Molecular Biology Laboratory (EMBL)
Google Summer of Code
International Joint Research Program of NEDO (Japan)
Japanese Ministry of Agriculture
Japanese Ministry of Education, Culture, Sports, Science andTechnology
JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)
JST ERATO-SORST Program (Japan)
Keio University (Japan)
Molecular Sciences Institute (USA)
National Science Foundation (USA)
STRI, University of Hertfordshire (UK)
SBML funding sources over the past 14 years