My presentation during the session titled "Reproducibility of computational research: methods to avoid madness" on Wednesday, 17 September 2014, during ICSB 2014, held in Melbourne, Australia.
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Standards and software: practical aids for reproducibility of computational research in systems biology
1. Standards and software: practical aids for
reproducibility of computational research
in systems biology
Michael Hucka, Ph.D. (on behalf of many)
Department of Computing and Mathematical Sciences
California Institute of Technology
Pasadena, CA, USA
Email: mhucka@caltech.edu Twitter: @mhucka
ICSB 2014, Melbourne, Australia, September 2014
2. SBML as enabler
The COMBINE family of standardization efforts
Examples of growing software capabilities
Outline
5. Communication of models in the olden days…
Describe the models and equations in a paper, and you’re done, right?
Problems:
• Errors in printing
• Missing information
• Dependencies on
implementation
• Outright errors
• Can be a huge
effort to recreate
9. SBML = Systems Biology Markup Language
Open format for representing models of biological processes
• Data structures + principles + serialization to XML
• (Mostly) Declarative description—not a scripting language
(Mostly) neutral with respect to modeling framework
• E.g., ODE, stochastic systems, etc.
Does not store experimental data, or simulation descriptions
Meant for software to read/write, not humans
11. SBML is a file format based on XML
Software shields you from working with this directly
12. The process is central
• Literally called “reaction” (not necessarily biochemical)
• Participants are pools of entities of the same kind (“species”)
!
!
!
na1 A + nb1 B f1(...) nc1 C
na2 A + nd2 D f2(...) ne1 E
nc3 C f3(...) nf3 F + ng3 G
...
!
• Species are located in containers (“compartments”)
- Core SBML assumes well-mixed compartments
Models can further include:
• Other constants & variables
• Discontinuous events
• Unit definitions
• Annotations
• Other, explicit math
Core SBML concepts are fairly simple
13. Example of model type Example model
!
ODE (e.g., cell differentiation)
BioModels Database model
#BIOMD0000000451
Conductance-based (e.g., Hodgin-Huxley)
BioModels Database model
#BIOMD0000000020
Typically do not use SBML “reaction” construct,
but instead use “rate rules” construct
Neural (e.g., spiking neurons)
BioModels Database model
#BIOMD0000000127
Typically use “events” for discontinuous changes
Pharmacokinetic/dynamics
BioModels Database model
#BIOMD0000000234
“Species” are not required to be biochemical entities
Infectious diseases BioModels Database model
#MODEL1008060001
List originally by Nicolas Le Novére
Core SBML constructs support many types of models
14. Status
SBML Level 3 What it
Hierarchical model composition Models containing submodels ✔
Flux balance constraints Constraint-based models ✔
Qualitative models Petri net models, Boolean models ✔
Graph layout Diagrams of models ✔
Multicomponent/state species Entities w/ structure; also rule-based models draft
Spatial Nonhomogeneous spatial models draft
Graph rendering Diagrams of models draft
Groups Arbitrary grouping of components draft
Arrays Arrays of entities draft
Dynamic structures Creation & destruction of components draft
Distributions Numerical values as statistical distributions draft
Annotations Richer annotation syntax
15. Accepted by dozens of journals *
100’s of software tools available today
• 260+ listed in SBML Software Guide †
1000’s of models available
• ... in public databases
• ... as supplementary data to papers
• ... in private repositories
!
!
!
!
* http://sbml.org/Documents/Publications_known_to_accept_submissions_in_SBML_format
† http://sbml.org/SBML_Software_Guide
http://sbml.org
Many models and software resources are available today
16. SBML enables better reproducibility of models
BioModels Database @ EBI
• 530 manually-curated models
• 650 non-curated models
• 143,000 auto-generated models
http://biomodels.net
Anecdotal report from BioModels Database curators:
• Models encoded from papers used to fail almost 100% of the time
• Models submitted in SBML format:
- 60% work right away
- 40% work “most of the time” with help from authors
17. SBML enables integration of information
SBML supports integration in multiple ways:
• SBML itself is an integrative framework
- Many different types of models all use one core framework
• Hierarchical Model Composition: integrate multiple models into one
- Permits hierarchically-structured models
- Permits libraries of reusable components
• Annotations within SBML:
- Incorporate data, notes, software-specific extensions
- Link to external data, other models, etc.—anything with a URI
20. ?
BIOMD0000000319 in BioModels Database
Decroly & Goldbeter, PNAS, 1982
Many things not addressed by SBML alone
21. Waltemath et al.,
BMC Sys Bio 5, 2011.
Efforts like SED-ML improve reproducibility
22. Motivations for the creation of COMBINE
Realizations about the state of affairs in late-2000’s
• Many efforts overlapped, but lacked coordination
• Invented their own processes from scratch
• Many separate meetings meant more travel for many people
• Limited and fragile funding didn’t support solid base
COMBINE = Computational Modeling in Biology Network
• Coordinate meetings
• Coordinate standards development
• Develop common procedures & tools
• Provide a recognized voice
25. Example: standardizing descriptions of spatial models in SBML
Virtual Cell
SBML Level 3 Spatial package
draft specification
COPASI
MCell and CellBlender
26. Example: synthetic biology and simulations
SBOL = Synthetic Biology Open Language
• Format + visual notation for exchange of genetic designs
• SBOL Developers Group includes 29 organizations
SBOL is leveraging other COMBINE standards
• Using SBML/CellML/etc. + SED-ML to describe module behavior
!
!
!
!
• Connecting to BioPAX and SBGN
• Using COMBINE Archive format
27. Example: Path2Models
Automatically generates mathematical models from pathway resources
Starts !
with
KEGG,
MetaCyc
!
and !
BioPAX
pathway
sources
!
!
!
!
!
Büchel et al., BMC Systems Biology. (2013). 7
Outputs SBML (with MIRIAM
annotations) and SBGN
!
Leverages many open resources – e.g., SABIO-RK, BioCarta, ChEBI, GO, …
29. Attendees of COMBINE 2014, Los Angeles, California, USA
Huge thanks to everyone in the COMBINE community
Attendees of COMBINE 2013, Paris, France
30. National Institute of General Medical Sciences (USA)
Air Force Office of Scientific Research (USA)
BBSRC (UK)
Beckman Institute, Caltech (USA)
DARPA IPTO Bio-SPICE Bio-Computation Program (USA)
Drug Disease Model Resources (EU-EFPIA Innovative Medicine Initiate)
ELIXIR (UK)
European Molecular Biology Laboratory (EMBL)
Google Summer of Code
International Joint Research Program of NEDO (Japan)
Japanese Ministry of Agriculture
Japanese Ministry of Education, Culture, Sports, Science and Technology
JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)
JST ERATO-SORST Program (Japan)
Keio University (Japan)
Molecular Sciences Institute (USA)
National Science Foundation (USA)
STRI, University of Hertfordshire (UK)
SBML funding sources over the past 14 years