Diese Präsentation wurde erfolgreich gemeldet.

SBML (the Systems Biology Markup Language), model databases, and other resources

2

Teilen

Wird geladen in …3
×
1 von 93
1 von 93

SBML (the Systems Biology Markup Language), model databases, and other resources

2

Teilen

Herunterladen, um offline zu lesen

Tutorial given at the 2012 Computational Cell Biology Summer School at Cold Spring Harbor Laboratory, New York, USA, in August, 2012.

Tutorial given at the 2012 Computational Cell Biology Summer School at Cold Spring Harbor Laboratory, New York, USA, in August, 2012.

Weitere Verwandte Inhalte

Ähnlich wie SBML (the Systems Biology Markup Language), model databases, and other resources

Ähnliche Bücher

Kostenlos mit einer 14-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 14-tägigen Testversion von Scribd

Alle anzeigen

SBML (the Systems Biology Markup Language), model databases, and other resources

  1. 1. SBML (the Systems Biology Markup Language), model databases, and other resources Michael Hucka, Ph.D. Department of Computing + Mathematical Sciences California Institute of Technology Pasadena, CA, USA Email: mhucka@caltech.edu Twitter: @mhucka CCB 2012, August 2012, Cold Spring Harbor Laboratory, NY, USA
  2. 2. General background and motivations Brief summary of SBML features Outline A selection of resources for the SBML-oriented modeler Annotations, connections and semantics Current and upcoming developments in community standards Closing
  3. 3. General background and motivations Brief summary of SBML features Outline A selection of resources for the SBML-oriented modeler Annotations, connections and semantics Current and upcoming developments in community standards Closing
  4. 4. Research today: experimentation, computation, cogitation
  5. 5. The many roles of computation in biological research Instrument/device control, data management, data processing, database applications, statistical analysis, pattern matching, image processing, text mining, chemical structure prediction, genomic sequence analysis, proteomics, other *omics, molecular modeling, molecular dynamics, kinetic simulation, simulated evolution, phylogenetics, ... (to name only a subset)! Focus here: modeling and simulation
  6. 6. What are the outcomes of modeling and simulation? Usually, there are at least two scientific outcomes: • One or more models (+ associated claims about their behaviors) • Publication of the results (in some form) Models come in many forms
  7. 7. Models are results Models serve as statements of our current understanding of the phenomena being studied* • A computational model documents your theory in a concrete form Model can— • Reduce ambiguity in communication • Offer a concrete framework for adding new data and theories • Support direct evaluation of relationships between theories Bower & Bolouri, Computational modeling of genetic and biochemical networks, MIT Press, 2001
  8. 8. But only if the modeling results are reproducible
  9. 9. Is it enough to describe the model & equations in a paper? Many models have traditionally been published this way Problems: • Errors in printing • Missing information • Dependencies on implementation • Outright errors • Can be a huge effort to recreate
  10. 10. Is it enough to make your (software X) script available? It’s vital for good science: • Someone with access to the same software can try to run it, understand it, verify the computational results, build on them, etc. • Opinion: you should always do this in any case
  11. 11. Is it enough to make your (software X) code available? It’s vital for good science— • Someone with access to the same software can try to run it, understand it, build on it, etc. • Opinion: you should always do this in any case But it’s still not ideal for communication of scientific results: • What if they don’t have access to that software? • And anyway, how will people find the model? • And how will people be able to relate the model to other work?
  12. 12. Different tools different interfaces & languages
  13. 13. Communication is better with interoperable data formats
  14. 14. General background and motivations Brief summary of SBML features Outline A selection of resources for the SBML-oriented modeler Annotations, connections and semantics Current and upcoming developments in community standards Closing
  15. 15. SB ML :a fo lin rs g of ua tw fr ar an e ca
  16. 16. SBML = Systems Biology Markup Language Format for representing computational models of biological processes • Data structures + usage principles + serialization to XML Neutral with respect to modeling framework • E.g., ODE, stochastic systems, etc. Development started in 2000, with first specification distributed in 2001
  17. 17. The process is central • Called a “reaction” in SBML • Participants are pools of entities (species) Models can further include: • Other constants & variables • Unit definitions • Compartments • Annotations • Explicit math • Discontinuous events Basic SBML concepts are fairly simple
  18. 18. Well-stirred compartments c n
  19. 19. Species pools are located in compartments c protein A protein B n gene mRNAn mRNAc
  20. 20. Reactions can involve any species anywhere c protein A protein B n gene mRNAn mRNAc
  21. 21. Reactions can cross compartment boundaries c protein A protein B n gene mRNAn mRNAc
  22. 22. Reaction/process rates can be (almost) arbitrary formulas c protein A f1(x) protein B n f5(x) f2(x) gene f4(x) mRNAn f3(x) mRNAc
  23. 23. “Rules”: equations expressing relationships in addition to reaction sys. g1(x) c g2(x) protein A f1(x) protein B . . . n f5(x) f2(x) gene f4(x) mRNAn f3(x) mRNAc
  24. 24. “Events”: discontinuous actions triggered by system conditions g1(x) c g2(x) protein A f1(x) protein B . . . n f5(x) f2(x) gene f4(x) mRNAn f3(x) mRNAc Event1: when (...condition...), Event2: when (...condition...), ... do (...assignments...) do (...assignments...)
  25. 25. Annotations: machine-readable semantics and links to other resources “This is identified “This is an enzymatic c g1(x)by GO id # ...” reaction with EC # ...” g2(x) . protein A f1(x) protein B . “This is a transport . n into the nucleus ...” “This compartment represents the nucleus ...” f5(x) f2(x) gene f4(x) mRNAn f3(x) mRNAc “This event represents ...” Event1: when (...condition...), Event2: when (...condition...), ... do (...assignments...) do (...assignments...)
  26. 26. Today: spatially homogeneous models • Metabolic network models Find BioM exam ples in • Signaling pathway models http: odels Data base • Conductance-based models //bio mod els.ne t/bio • Neural models models • Pharmacokinetic/dynamics models • Infectious diseases Coming: SBML Level 3 packages to support other types • E.g.: Spatially inhomogeneous models, also qualitative/logical Scope of SBML encompasses many types of models
  27. 27. Herrgård et al., Nature Biotech., 26:10, 2008 2342 reactions A consensus yeast metabolic network reconstruction © 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology obtained from a community approach to systems biology Markus J Herrgård1,19,20, Neil Swainston2,3,20, Paul Dobson3,4, Warwick B Dunn3,4, K Yalçin Arga5, Mikko Arvas6, Nils Blüthgen3,7, Simon Borger8, Roeland Costenoble9, Matthias Heinemann9, Michael Hucka10, Nicolas Le Novère11, Peter Li2,3, Wolfram Liebermeister8, Monica L Mo1, Ana Paula Oliveira12, Dina Petranovic12,19, Stephen Pettifer2,3, Evangelos Simeonidis3,7, Kieran Smallbone3,13, Irena Spasić2,3, Dieter Weichart3,4, Roger Brent14, David S Broomhead3,13, Hans V Westerhoff 3,7,15, Betül Kırdar5, Merja Penttilä6, Edda Klipp8, Bernhard Ø Palsson1, Uwe Sauer9, Stephen G Oliver3,16, Pedro Mendes2,3,17, Jens Nielsen12,18 & Douglas B Kell*3,4 Genomic data allow the large-scale manual or semi-automated of their parameters. Armed with such information, it is then possible to assembly of metabolic network reconstructions, which provide provide a stochastic or ordinary differential equation model of the entire highly curated organism-specific knowledge bases. Although metabolic network of interest. An attractive feature of metabolism, for the several genome-scale network reconstructions describe purposes of modeling, is that, in contrast to signaling pathways, metabo- Saccharomyces cerevisiae metabolism, they differ in scope lism is subject to direct thermodynamic and (in particular) stoichiometric and content, and use different terminologies to describe the constraints3. Our focus here is on the first two stages of the reconstruction same chemical entities. This makes comparisons between them process, especially as it pertains to the mapping of experimental metabo- difficult and underscores the desirability of a consolidated lomics data onto metabolic network reconstructions. metabolic network that collects and formalizes the ‘community Besides being an industrial workhorse for a variety of biotechnological knowledge’ of yeast metabolism. We describe how we have products, S. cerevisiae is a highly developed model organism for biochemi- produced a consensus metabolic network reconstruction cal, genetic, pharmacological and post-genomic studies5. It is especially for S. cerevisiae. In drafting it, we placed special emphasis attractive because of the availability of its genome sequence6, a whole series on referencing molecules to persistent databases or using of bar-coded deletion7,8 and other9 strains, extensive experimental ’omics database-independent forms, such as SMILES or InChI strings, data10–14 and the ability to grow it for extended periods under highly con- as this permits their chemical structure to be represented trolled conditions15. The very active scientific community that works on unambiguously and in a manner that permits automated S. cerevisiae has a history of collaborative research projects that have led to reasoning. The reconstruction is readily available via a publicly substantial advances in our understanding of eukaryotic biology6,8,13,16,17. Model scale & complexity have been increasing Many significant and popular models are in SBML form accessible database and in the Systems Biology Markup Language (http://www.comp-sys-bio.org/yeastnet). It can be maintained as a resource that serves as a common denominator Furthermore, yeast metabolic physiology has been the subject of inten- sive study and most of the components of the yeast metabolic network are relatively well characterized. Taken together, these factors make yeast
  28. 28. SBML Level 1 SBML Level 2 SBML Level 3 predefined math functions user-defined functions user-defined functions text-string math notation MathML subset MathML subset reserved namespaces for no reserved namespaces no reserved namespaces annotations for annotations for annotations no controlled annotation RDF-based controlled RDF-based controlled scheme annotation scheme annotation scheme no discrete events discrete events discrete events default values defined default values defined no default values monolithic monolithic modular
  29. 29. General background and motivations Brief summary of SBML features Outline A selection of resources for the SBML-oriented modeler Annotations, connections and semantics Current and upcoming developments in community standards Closing
  30. 30. You want models? We got models.
  31. 31. BioModels Database Stores & serves quantitative models of biological interest • Free, public resource • Models must be described in peer-reviewed publication(s) Hundreds of models are curated by hand Imports & exports models in several formats Figure courtesy of Camille Laibe
  32. 32. BioModels Database http://biomodels.net/biomodels
  33. 33. Contents of BioModels Database Contents today: • 142,000+ pathway models (converted from KEGG) • 400+ hand-curated quantitative models signal transduction 9% metabolic process 3% 3% 25% multicelullar organismal process 5% rhythmic process cell cycle 6% homeostatic process response to stimulus 8% cell death 9% 23% localization others (e.g., developmental process) 9% • 400+ non-curated quantitative models Database data from 2012-08-10
  34. 34. How can you check that a given SBML file is valid?
  35. 35. The Online SBML Validator
  36. 36. The Online SBML Validator Find it here http://sbml.org/Facilities/Validator
  37. 37. Where can you find more software?
  38. 38. Find software in the SBML Software Guide
  39. 39. Find software in the SBML Software Guide Find SBML software
  40. 40. Results of 2011 survey of SBML-compatible software Question: Which of the following categories best describe your software? (Check all that apply.) Simulation software 42 Analysis s/w (in addition, or instead of, simulation) 40 Creation/model development software 31 Visualization/display/formatting software 31 Utility software (e.g., format conversion) 23 Data integration and management software 16 Repository or database 14 Framework or library (for use in developing s/w) 13 S/w for interactive env. (e.g., MATLAB, R, ...) 13 Annotation software 11 0 20 40 60 80 Out of 81 responses
  41. 41. What about libraries for writing SBML-compatible software?
  42. 42. libSBML Reads, writes, validates SBML Can check & convert units Written in portable C++ Runs on Linux, Mac, Windows APIs for C, C++, C#, Java, Octave, Perl, Python, R, Ruby, MATLAB Well documented API Open-source (LGPL) http://sbml.org/Software/libSBML
  43. 43. JSBML Pure Java implementation API is compatible with libSBML but more Java-like Functionality is subset of libSBML Open source (LGPL) http://sbml.org/Software/JSBML
  44. 44. How can you stay informed of new developments?
  45. 45. Resources for news, questions and discussions
  46. 46. Front-page news Resources for news, questions and discussions
  47. 47. Twitter & RSS feeds Resources for news, questions and discussions
  48. 48. Mailing lists/forums Resources for news, questions and discussions
  49. 49. General background and motivations Brief summary of SBML features Outline A selection of resources for the SBML-oriented modeler Annotations, connections and semantics Current and upcoming developments in community standards Closing
  50. 50. SBML itself provides syntax and only limited semantics
  51. 51. SBML itself provides syntax and only limited semantics No standard identifiers
  52. 52. SBML itself provides syntax and only limited semantics Low info content No standard identifiers
  53. 53. SBML itself provides syntax and only limited semantics Raw models alone are insufficient Need standard schemes for Low info machine-readable annotations content • Identify entities • Mathematical semantics • Links to other data resources • Authorship & pub. info No standard identifiers
  54. 54. Element in Entity elsewhere the model (e.g., in a database) relationship qualifier (optional) Annotations at their simplest
  55. 55. Annotations add meaning and connections Annotations can answer questions: • “What exactly is the process represented by equation ‘r17’?” • “What other identities (synonyms) does this entity have?” • “What role does constant ‘k3’ play in equation ‘r17’?” • “What organism are we talking about?” • ... etc. ... Multiple annotations on same entity are common
  56. 56. SBML supports two annotation schemes SBO (Systems Biology Ontology) • For mathematical semantics • One SBML object ← one SBO term • Short, compact, tightly coupled but limited scope MIRIAM (Minimum Information Requested In the Annotation of Models) • For any kind of annotation • One SBML object ← multiple MIRIAM annotations • Larger, more free-form, wider scope Both are externalized and independent of SBML
  57. 57. Systems Biology Ontology (SBO) http://biomodels.net/sbo
  58. 58. <sbml ...> ... <listOfCompartments> <compartment id="cell" size="1e-15" /> </listOfCompartments> <listOfSpecies> <species compartment="cell" id="S1" initialAmount="1000" /> <species compartment="cell" id="S2" initialAmount="0" /> <listOfSpecies> <listOfParameters> <parameter id="k" value="0.005" sboTerm="SBO:0000339" /> <listOfParameters> <listOfReactions> <reaction id="r1" reversible="false"> <listOfReactants> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000010" /> </listOfReactants> <listOfProducts> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000011" /> </listOfProducts> <kineticLaw sboTerm="SBO:0000052"> <math> ... <math> ... </sbml>
  59. 59. <sbml ...> ... <listOfCompartments> <compartment id="cell" size="1e-15" /> </listOfCompartments> <listOfSpecies> <species compartment="cell" id="S1" initialAmount="1000" /> <species compartment="cell" id="S2" initialAmount="0" /> <listOfSpecies> <listOfParameters> <parameter id="k" value="0.005" sboTerm="SBO:0000339" /> SBO:0000339 <listOfParameters> <listOfReactions> <reaction id="r1" reversible="false"> <listOfReactants> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000010" /> </listOfReactants> <listOfProducts> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000011" /> </listOfProducts> <kineticLaw sboTerm="SBO:0000052"> <math> ... <math> ... </sbml>
  60. 60. <sbml ...> ... <listOfCompartments> <compartment id="cell" size="1e-15" /> </listOfCompartments> <listOfSpecies> <species compartment="cell" id="S1" initialAmount="1000" /> <species compartment="cell" id="S2" initialAmount="0" /> <listOfSpecies> <listOfParameters> <parameter id="k" value="0.005" sboTerm="SBO:0000339" /> SBO:0000339 <listOfParameters> <listOfReactions> <reaction id="r1" reversible="false"> <listOfReactants> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000010" /> </listOfReactants> <listOfProducts> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000011" /> </listOfProducts> <kineticLaw sboTerm="SBO:0000052"> <math> ... <math> ... </sbml> “forward bimolecular rate constant, continuous case”
  61. 61. Software can use SBO terms to help you work with models semanticSBML SBMLsqueezer
  62. 62. MIRIAM (Minimum Information Requested In the Annotation of Models) Addresses 2 general areas of annotation needs: Requirements for Scheme for encoding reference correspondence annotations Annotations for Annotations for attributing model referring to external creators & sources data resources MIRIAM is not specific to SBML
  63. 63. MIRIAM (Minimum Information Requested In the Annotation of Models) Addresses 2 general areas of annotation needs: Requirements for Scheme for encoding reference correspondence annotations Annotations for Annotations for attributing model referring to external creators & sources data resources MIRIAM is not specific to SBML
  64. 64. Goal: permit tracing model’s origins & people involved in its creation Minimal info required: • Name for the model • Citation for a description of what is being modeled & its author • Contact info for the model creator(s) • Creation date & time • Last modification date & time • Statement of the model’s terms of distribution - Specific terms not mandated, just a statement of the terms Annotations for attributing model creators and sources
  65. 65. MIRIAM (Minimum Information Requested In the Annotation of Models) Addresses 2 general areas of annotation needs: Requirements for Scheme for encoding reference correspondence annotations Annotations for Annotations for attributing model referring to external creators & sources data resources MIRIAM is not specific to SBML
  66. 66. MIRIAM (Minimum Information Requested In the Annotation of Models) Addresses 2 general areas of annotation needs: Requirements for Scheme for encoding reference correspondence annotations Annotations for Annotations for attributing model referring to external creators & sources data resources MIRIAM is not specific to SBML
  67. 67. Annotations for external references Goal: link model constituents to corresponding entities in bioinformatics resources (e.g., databases, controlled vocabularies) • Supports: - Precise identification of model constituents - Discovery of models that concern the same thing - Comparison of model constituents between different models MIRIAM approach avoids putting data content directly in the model; instead, it points at external resources that contain the knowledge.
  68. 68. http://www.ebi.ac.uk/chebi Low info content Why might you care?
  69. 69. http://www.ebi.ac.uk/chebi salicylic acid Known by different names –  Low info you want to write all of do content them into your model? Why might you care?
  70. 70. Identifying resources has its own challenges For linking to data, need: • Globally unique, unambiguous identifiers • ... that are persistent despite resource changes (e.g., changed URLs) • ... that are maintained by the community Problem: different resources have different identification schemes • E.g.: entity “16480” - In ChEBI: entry 16480 is nitrous oxide - In PubMed: entry 16480 is the 1977 paper “Effect of gallstone- dissolution therapy on human liver structure” - In PubChem: entry 16480 is 1-chloro-4-isothiocyanatobenzene
  71. 71. How do we create globally unique identifiers consistently? Long story short: • Create unique resource identifiers (URIs) by combining 2 parts: namespace entity identifier { { Identifies a dataset Identifies a datum within the dataset • Create registry for namespaces - Allows people & software to use same namespace identifiers • Create service for URI resolution - Allows people & software to take a given resource identifier and figure out what it points to
  72. 72. Resolving resource identifiers MIRIAM Registry supports the creation of globally unique identifiers • Example MIRIAM identifier: urn:miriam:ec-code:1.1.1.1 • Provides various data about the resource, including alternate servers • Provides web services identifiers.org is layered on top of that and provides resolvable URIs • Can type it in a web browser! • Example identifiers.org URI: http://identifiers.org/ec-code/1.1.1.1
  73. 73. BioModels Database: example of using the annotations
  74. 74. Annotations enable many interesting possibilities Annotations interesting possibilities semanticSBML Figure courtesy of Wolfram Leibermeister
  75. 75. Summary: why care about standard ways of writing annotations? Structured, machine-readable annotations increase your model’s utility • Allow more precise identification of model components - Understand model structure - Search/discover models - Compare models • Adds a semantic layer—integrates knowledge into the model - Helps recipients understand the underlying biology - Allows for better reuse of models - Supports conversion of models from one form to another
  76. 76. General background and motivations Brief summary of SBML features Outline A selection of resources for the SBML-oriented modeler Annotations, connections and semantics Current and upcoming developments in community standards Closing
  77. 77. Model representation level Concept due to Nicolas Le Novère Visual interpretation Biological semantics Dis Co cre nti te Mathematical semantics nuo s toc us ha lum sti pe ce Me dp nti tie tion an Sta ara s lc rea ion fie te me ode tat ld ap tra ter M la nno pro ns itio ode al ysis xim n M de l an ults Mo res ati on erical Num Model type Model life-cycle Major dimensions of a computational model
  78. 78. What about other kinds of models?
  79. 79. SBML Level 3: Supporting more categories of models Package W Package X Package Y Package Z SBML Level 3 Core (dependencies) An SBML Level 3 package adds constructs & capabilities Models declare which packages they use • Applications tell users which packages they support Package development can be decoupled
  80. 80. Level 3 package What it enables Hierarchical composition Models containing submodels Flux balance constraints Flux balance analysis models Qualitative models Petri net models, Boolean models Spatial Nonhomogeneous spatial models Multicomponent species Entities with structure & state; rule-based models Graph layout Diagrams of models Graph rendering Diagrams of models Distribution & ranges Nonscalar values Annotations Richer annotation syntax Groups Arbitrary grouping of model components Dynamic structures Creation & destruction of model components Arrays & sets Arrays or sets of entities
  81. 81. How can we capture the simulation/analysis procedures?
  82. 82. Decroly & Goldbeter, PNAS, 1982 ? BIOMD0000000319 in BioModels Database Software can’t read figure legends
  83. 83. SED-ML = Simulation Experiment Description ML Application-independent format to capture procedures, algorithms, parameter values • Neutral format for encoding the steps to go from model to output Can be used for • Simulation experiments encoding parametrizations & perturbations • Simulations using more than one model • Simulations using more than one method • Data manipulations to produce plot(s) libSedML project developing API library http://www.biomodels.net/sed­ml
  84. 84. What about visual diagrams?
  85. 85. Graphical representation of models Today: broad variation in graphical notation used in biological diagrams • Between authors, between journals, even people in same group However, standard notations would offer benefits: • Consistency = easier to read diagrams with less ambiguity • Software support: verification of correctness, translation to math
  86. 86. SBGN = Systems Biology Graphical Notation Goal: standardize the graphical notation in diagrams of biological processes • Community-based development, à la SBML Many groups participating 3 sublanguages to describe different facets of a model http://sbgn.org
  87. 87. General background and motivations Brief summary of SBML features Outline A selection of resources for the SBML-oriented modeler Annotations, connections and semantics Current and upcoming developments in community standards Closing
  88. 88. Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010 Such standards are the work of a great community
  89. 89. Get involved and make things better! COMBINE (Computational Modeling in Biology Network) • SBML, SBGN, BioPAX, SED-ML, CellML, NeuroML http://co.mbine.org Upcoming meeting: August 15–19 in Toronto, Canada • Right before ICSB (International Conference on Systems Biology)
  90. 90. SBML http://sbml.org BioModels Database http://biomodels.net/biomodels COMBINE http://co.mbine.org identifiers.org http://identifiers.org URLs MIRIAM http://biomodels.net/miriam SED-ML http://biomodels.net/sed-ml SBO http://biomodels.net/sbo SBGN http://sbgn.org
  91. 91. I’d like your feedback! You can use this anonymous form: http://tinyurl.com/mhuckafeedback
  92. 92. SBML was made possible thanks to funding from: National Institute of General Medical Sciences (USA) European Molecular Biology Laboratory (EMBL) JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003) JST ERATO-SORST Program (Japan) ELIXIR (UK) Beckman Institute, Caltech (USA) Keio University (Japan) International Joint Research Program of NEDO (Japan) Japanese Ministry of Agriculture Japanese Ministry of Educ., Culture, Sports, Science and Tech. BBSRC (UK) National Science Foundation (USA) DARPA IPTO Bio-SPICE Bio-Computation Program (USA) Air Force Office of Scientific Research (USA) STRI, University of Hertfordshire (UK) Molecular Sciences Institute (USA)

×