SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Creating a new language to support
open innovation
Michael Hucka, Ph.D.
Department of Computing + Mathematical Sciences
California Institute of Technology
Pasadena, CA, USA
BioBrieļ¬ng ā€“Ā BioMelbourne Network, Australia, August 2013
Email: mhucka@caltech.edu Twitter: @mhucka
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary eļ¬€orts: MIRIAM and SED-ML
COMBINE: the Computational Modeling in Biology Network
Conclusion
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary eļ¬€orts: MIRIAM and SED-ML
COMBINE: the Computational Modeling in Biology Network
Conclusion
Research today: experimentation, computation, cogitation
ā€œ The nature of systems biologyā€
Bruggeman & Westerhoļ¬€,
Trends Microbiol. 15 (2007).
Large-scale integrative models are growing
Many models have traditionally been published this way
Problems:
ā€¢ Errors in printing
ā€¢ Missing information
ā€¢ Dependencies on
implementation
ā€¢ Outright errors
ā€¢ Can be a huge
eļ¬€ort to recreate
Is it enough to communicate the model in a paper?
Experiences from BioModels Database
BioModels Database:
ā€¢ Public database of published computational models in biology
ā€¢ Many models are curated ā€“Ā i.e., made to work & annotated
- If not available in electronic form, they encode it from the paper
Their experiences?
ā€¢ Vast majority of models encoded directly from the publication
did not work as published
- Often (not always) due to common errorsĀ ā€“Ā typos, omissions
ā€¢ Success rate improved in recent years thanks to more people
providing their models in electronic formats
More is needed to makeĀ computational results reproducible
Experiences from BioModels Database
BioModels Database:
ā€¢ Public database of published computational models in biology
ā€¢ Many models are curated ā€“Ā i.e., made to work & annotated
- If not available in electronic form, they encode it from the paper
Their experiences?
ā€¢ Vast majority of models encoded directly from the publication
did not work as published
- Often (not always) due to common errorsĀ ā€“Ā typos, omissions
ā€¢ Success rate improved in recent years thanks to more people
providing their models in electronic formats
More is needed to makeĀ computational results reproducible
http://biomodels.net/biomodels
Experiences from BioModels Database
BioModels Database:
ā€¢ Public database of published computational models in biology
ā€¢ Many models are curated ā€“Ā i.e., made to work & annotated
- If not available in electronic form, they encode it from the paper
Their experiences?
ā€¢ Vast majority of models encoded directly from the publication
did not work as published
- Often (not always) due to common errorsĀ ā€“Ā typos, omissions
ā€¢ Success rate improved in recent years thanks to more people
providing their models in electronic formats
More is needed to makeĀ computational results reproducible
Is it enough to make your (software X) code available?
Itā€™s vital for good science:
ā€¢ Someone with access to the same software can try to run it,
understand it, verify the computational results, build on them, etc.
ā€¢ Opinion: you should always do this in any case
Is it enough to make your (software X) code available?
Itā€™s vital for good scienceā€”
ā€¢ Someone with access to the same software can try to run it,
understand it, build on it, etc.
ā€¢ Opinion: you should always do this in any case
But itā€™s still not ideal for communication of scientiļ¬c results:
ā€¢ What if they donā€™t have access to the same software?
ā€¢ What if they donā€™t want to use that software?
ā€¢ What if they want to use a diļ¬€erent conceptual framework?
ā€¢ And how will people be able to relate the model to other work?
Diļ¬€erent tools diļ¬€erent interfaces & languages
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary eļ¬€orts: MIRIAM and SED-ML
COMBINE: the Computational Modeling in Biology Network
Conclusion
SBML:alinguafranca
forsoftware
Format for representing computational models of biological processes
ā€¢ Data structures + usage principles + serialization to XML
ā€¢ (Mostly) Declarative, not proceduralā€”not a scripting language
Neutral with respect to modeling framework
ā€¢ E.g., ODE, stochastic systems, etc.
Important: software reads/writes SBML, not humans
SBML = Systems Biology Markup Language
The raw SBML
The process is central
ā€¢ Literally called aā€œreactionā€in SBML
ā€¢ Participants are pools of entities (biochemical species)
Models can further include:
ā€¢ Compartments
ā€¢ Other constants & variables
ā€¢ Discontinuous events
ā€¢ Other, explicit math
Core SBML concepts are fairly simple
ā€¢ Unit deļ¬nitions
ā€¢ Annotations
SBML is now widely used
Dozens of journals accept models in SBML format
100ā€™s of software tools available today
1000ā€™s of models available in SBML format today
0
100
200
300
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
254+ today
Contents of BioModels Database
Contents today:
ā€¢ 142,000+ pathway models (converted from KEGG)
ā€¢ 460+ hand-curated quantitative models
ā€¢ 460+ non-curated quantitative models
8%
2%
3%
6%
6%
7%
8%
9%
24%
27%
signal transduction
metabolic process
multicelullar organismal process
rhythmic process
cell cycle
homeostatic process
response to stimulus
cell death
localization
others (e.g., developmental process)
Database data from 2013
Free software libraries ā€“Ā libSBML
Reads, writes, validates SBML
Can check & convert units
Written in portable C++
Runs on Linux, Mac, Windows
APIs for C, C++, C#, Java, Octave,
Perl, Python, R, Ruby, MATLAB
Well documented API
Open-source (LGPL)
http://sbml.org/Software/libSBML
Free software libraries ā€“ JSBML
Pure Java implementation
API is compatible with libSBML but
more Java-like
Functionality is subset of libSBML
Open source (LGPL)
http://sbml.org/Software/JSBML
Evolution of SBML continues
Today: SBML Level 3
ā€¢ Level 3 Core provides framework for common models
ā€¢ Level 3 packages add additional constructs to the Core
Level 3 package What it enables
Hierarchical model composition Models containing submodels āœ”
Flux balance constraints Constraint-based models āœ”
Qualitative models Petri net models, Boolean models āœ”
Graph layout Diagrams of models āœ”
Multicomponent/state species Entities w/ structure; also rule-based models draft
Spatial Nonhomogeneous spatial models draft
Graph rendering Diagrams of models draft
Groups Arbitrary grouping of components draft
Distributions Numerical values as statistical distributions in dev
Arrays & sets Arrays or sets of entities in dev
Dynamic structures Creation & destruction of components in dev
Annotations Richer annotation syntax
Status
NationalInstituteofGeneralMedicalSciences(USA)
European Molecular Biology Laboratory (EMBL)
JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)
JST ERATO-SORST Program (Japan)
ELIXIR (UK)
Beckman Institute, Caltech (USA)
Keio University (Japan)
International Joint Research Program of NEDO (Japan)
Japanese Ministry of Agriculture
Japanese Ministry of Educ., Culture, Sports, Science and Tech.
BBSRC (UK)
National Science Foundation (USA)
DARPA IPTO Bio-SPICE Bio-Computation Program (USA)
Air Force Office of Scientific Research (USA)
STRI, University of Hertfordshire (UK)
Molecular Sciences Institute (USA)
SBML funding sources over the past 13+ years
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary eļ¬€orts: MIRIAM and SED-ML
COMBINE: the Computational Modeling in Biology Network
Conclusion
Mathematical semantics
Biological semantics
Visual interpretation
Discrete stochastic entities
Continuous lumped parameter
State transition
Meanfieldapproximation
Model type
Model creation
Model annotation
Model analysis
Numericalresults
Model life-cycle
Model representation level
COMBINE eļ¬€orts cover diļ¬€erent facets of modeling
...
ConceptduetoNicolasLeNovĆØre
Modelerswanttousetheirownconventions
Modelerswanttousetheirownconventions
No standard
identiļ¬ers
Modelerswanttousetheirownconventions
Low info
content
No standard
identiļ¬ers
Raw models alone are insuļ¬ƒcient
Need standard schemes for
machine-readable annotations
ā€¢ Identify entities
ā€¢ Mathematical semantics
ā€¢ Links to other data resources
ā€¢ Authorship & pub. info
Modelerswanttousetheirownconventions
Low info
content
No standard
identiļ¬ers
Addresses 2 general areas of annotation needs:
MIRIAM is not speciļ¬c to SBML
MIRIAM(MinimumInformationRequestedIntheAnnotationofModels)
Requirements for
reference correspondence
Scheme for encoding
annotations
Annotations for
attributing model
creators & sources
Annotations for
referring to external
data resources
Addresses 2 general areas of annotation needs:
MIRIAM is not speciļ¬c to SBML
MIRIAM(MinimumInformationRequestedIntheAnnotationofModels)
Requirements for
reference correspondence
Scheme for encoding
annotations
Annotations for
attributing model
creators & sources
Annotations for
referring to external
data resources
Annotations for
referring to external
data resources
Example of a problem that can be solved with annotations
http://www.ebi.ac.uk/chebi
Low info
content
Example of a problem that can be solved with annotations
http://www.ebi.ac.uk/chebi
Low info
content
Known by different namesĀ ā€“Ā 
do you want to write all of
them into your model?
salicylic acid
MIRIAM annotations for external references
Goal: link model constituents to corresponding entities in
bioinformatics resources (e.g., databases, controlled vocabularies)
ā€¢ Supports:
- Precise identiļ¬cation of model constituents
- Discovery of models that concern the same thing
- Comparison of model constituents between diļ¬€erent models
MIRIAM approach avoids putting data content directly in the model
ā€¢ Instead, it points at external resources that contain the data
How do we create globally unique identiļ¬ers consistently?
Long story shortā€”developed by the Le NovĆØre group at the EBI
ā€¢ Resource identiļ¬ers (URIs) combine 2 parts:
ā€¢ Thereā€™s a registry for namespaces: MIRIAM Registry
- Allows people & software to use same namespace identiļ¬ers
ā€¢ Thereā€™s a URI resolution service: MIRIAM Resources & identifiers.org
- Allows people & software to take a given identiļ¬er and ļ¬gure
out what it points to
namespace entity identiļ¬er
{
{
Identiļ¬es a dataset Identiļ¬es a datum
within the dataset
Another problem: software canā€™t read ļ¬gure legends
?
BIOMD0000000319 in BioModels Database
Decroly & Goldbeter, PNAS, 1982
SED-ML = Simulation Experiment Description ML
Application-independent format
ā€¢Captures procedures, algorithms, parameter values
Can be used for
ā€¢Simulation experiments encoding parametrizations & perturbations
ā€¢Simulations using more than one model and/or method
ā€¢Data manipulations to produce plot(s)
http://sedml.org
Simulation
Model
Task Data generators
Reports
Eļ¬€orts like SED-ML improve reproducibility of publications
Waltemath et al.,
BMC Sys Bio 5, 2011.
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary eļ¬€orts: MIRIAM and SED-ML
COMBINE: the Computational Modeling in Biology Network
Conclusion
Need interoperable formats, but developing them is not easy
Need people with diverse set of knowledge & skills
ā€¢ Scientiļ¬c needs
ā€¢ Technical implementation skills
ā€¢ Practical experience
Need manage multiple phases of a standardization eļ¬€ort
ā€¢ Creation
ā€¢ Evolution
ā€¢ Support
Need interoperable formats, but developing them is not easy
Need people with diverse set of knowledge & skills
ā€¢ Scientiļ¬c needs
ā€¢ Technical implementation skills
ā€¢ Practical experience
Need manage multiple phases of a standardization eļ¬€ort
ā€¢ Creation
ā€¢ Evolution
ā€¢ Support
} This is just for the speciļ¬cation of the
standards, to say nothing of the necessary
software and other infrastructure!
Realizations about the state of aļ¬€airs in late-2000ā€™s
ā€¢ Many standardization eļ¬€orts overlapped, but lacked coordination
ā€¢ Eļ¬€orts were inventing their own processes from scratch
ā€¢ Many individual meetings meant more travel for many people
ā€¢ Limited and fragile funding didnā€™t support solid, coherent base
COMBINE = Computational Modeling in Biology Network
ā€¢ Coordinate standards development
ā€¢ Develop common procedures & tools (but not impose them!)
ā€¢ Coordinate meetings
ā€¢ Provide a recognized voice
Motivations for the creation of COMBINE
Standardization eļ¬€orts represented in COMBINE today
BioPAX
Qualiļ¬ers
GPML
COMBINE Standards
Associated Standardization Eļ¬€orts
Related Standardization Eļ¬€orts
Those are the products of successful, open collaborations!
Examples of community organization
Two main annual meetings, plus ad hoc workshops
ā€¢ COMBINE meeting: status updates, presentations, outreach
- NextĀ COMBINE: Paris, Sep 16ā€“20, 2013
ā€¢ HARMONY: Hackathon on Resources for Modeling in Biology
- Software development, interoperability hacking
COMBINE 2012, TorontoCOMBINE 2011, Heidelberg
What motivates people to do this?
Solving a problem for yourself/your closed group is easier and quicker
ā€¢ So what are these people getting out of it?
Some advantages of an open, community-oriented approach:
ā€¢ Finding better solutions they wouldnā€™t ļ¬nd alone
- Arguments Discussions leads to realizations & better solutions
ā€¢ Contributions to science ā€“Ā publications, peer recognition
ā€¢ Support of a standard makes their software more desirable
ā€¢ Sense of community involvement
Some admitted disadvantages:
ā€¢ Agreement takes time ā€“Ā progress can be very slow
ā€¢ Solutions may include features you didnā€™t plan on, or need
COMBINE is open to allā€”and COMBINE needs you!
http://co.mbine.org
Current coordinators:
ā€¢ Nicolas Le NovĆØre, Mike Hucka, Falk Schreiber, Gary Bader
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary eļ¬€orts: MIRIAM and SED-ML
COMBINE: the Computational Modeling in Biology Network
Conclusion
Time it well
ā€¢ Too early and too late are bad
Start with actual stakeholders
ā€¢ Address real needs, not perceived ones
Start with small team of dedicated developers
ā€¢ Can work faster, more focused; also avoidsā€œdesigned-by-committeeā€
Engage people constantly, in many ways
ā€¢ Electronic forums, email, electronic voting, surveys, hackathons
Make the results free and open-source
ā€¢ Makes people comfortable knowing it will always be available
Be creative about seeking funding
Some things we (maybe?) got right with SBML
Not waiting for implementations before freezing speciļ¬cations
ā€¢ Sometimes ļ¬nalized speciļ¬cation before implementations tested it
- Especially bad when we failed to do a good job
ā€£ E.g.,ā€œforward thinkingā€features, orā€œelegantā€designs
Not formalizing the development process suļ¬ƒciently
ā€¢ Especially early in the history, did not have a very open process
Not resolving intellectual property issues from the beginning
ā€¢ Industrial users askā€œwho has the right to give any rights to this?ā€
Some things we certainly got wrong
Nicolas Le NovĆØre, Henning Hermjakob, Camille Laibe, Chen Li, Lukas Endler,
Nico Rodriguez, Marco Donizelli,Viji Chelliah, MĆ©lanie Courtot, Harish Dharuri
Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010
John C. Doyle, Hiroaki Kitano
Mike Hucka, Sarah Keating, Frank Bergmann, Lucian Smith, Andrew Finney,
Herbert Sauro, Hamid Bolouri, Ben Bornstein, Bruce Shapiro, Akira Funahashi,
Akiya Juraku, Ben Kovitz
OriginalPIā€™s:
SBMLTeam:
SBMLEditors:
BioModelsDB:
Mike Hucka, Nicolas Le NovĆØre, Sarah Keating, Frank Bergmann, Lucian Smith,
Chris Myers, Stefan Hoops, Sven Sahle, James Schaff, DarrenWilkinson
And a huge thanks to many others in the COMBINE community
This work was made possibleĀ thanks to a great community
SBML http://sbml.org
BioModels Database http://biomodels.net/biomodels
MIRIAM http://biomodels.net/miriam
identiļ¬ers.org http://identiļ¬ers.org
SED-ML http://biomodels.net/sed-ml
SBO http://biomodels.net/sbo
SBGN http://sbgn.org
COMBINE http://co.mbine.org
URLs
Iā€™d like your feedback!
You can use this anonymous form:
http://tinyurl.com/mhuckafeedback

Weitere Ƥhnliche Inhalte

Ƅhnlich wie Creating a new language to support open innovation

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Sri Ambati
Ā 
SADP PPTs of all modules - Shanthi D.L.pdf
SADP PPTs of all modules - Shanthi D.L.pdfSADP PPTs of all modules - Shanthi D.L.pdf
SADP PPTs of all modules - Shanthi D.L.pdf
B.T.L.I.T
Ā 
20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment
Jonathan Blakes
Ā 

Ƅhnlich wie Creating a new language to support open innovation (20)

Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
Ā 
StandardsĀ and software: practical aids for reproducibility of computational r...
StandardsĀ and software: practical aids for reproducibility of computational r...StandardsĀ and software: practical aids for reproducibility of computational r...
StandardsĀ and software: practical aids for reproducibility of computational r...
Ā 
Common ground between modelers and simulation software: the Systems Biology M...
Common ground between modelers and simulation software: the Systems Biology M...Common ground between modelers and simulation software: the Systems Biology M...
Common ground between modelers and simulation software: the Systems Biology M...
Ā 
A status update on COMBINE standardization activities, with a focus on SBML
A status update on COMBINE standardization activities, with a focus on SBMLA status update on COMBINE standardization activities, with a focus on SBML
A status update on COMBINE standardization activities, with a focus on SBML
Ā 
SBML, SBML Packages, SED-ML, ā€Ø COMBINE Archive, and more
SBML, SBML Packages, SED-ML, ā€Ø COMBINE Archive, and moreSBML, SBML Packages, SED-ML, ā€Ø COMBINE Archive, and more
SBML, SBML Packages, SED-ML, ā€Ø COMBINE Archive, and more
Ā 
Michael Hucka.ppt
Michael Hucka.pptMichael Hucka.ppt
Michael Hucka.ppt
Ā 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
Ā 
Some SBML-related resources at SBML.org
Some SBML-related resources at SBML.orgSome SBML-related resources at SBML.org
Some SBML-related resources at SBML.org
Ā 
Recent software and services to support the SBML community
Recent software and services to support the SBML community Recent software and services to support the SBML community
Recent software and services to support the SBML community
Ā 
Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?
Ā 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Ā 
SADP PPTs of all modules - Shanthi D.L.pdf
SADP PPTs of all modules - Shanthi D.L.pdfSADP PPTs of all modules - Shanthi D.L.pdf
SADP PPTs of all modules - Shanthi D.L.pdf
Ā 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
Ā 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Ā 
The Path to Digital Engineering
The Path to Digital EngineeringThe Path to Digital Engineering
The Path to Digital Engineering
Ā 
20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment
Ā 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Iwesep19.ppt
Ā 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoML
Ā 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Ā 
Model Management in Systems Biology: Challenges ā€“ Approaches ā€“ Solutions
Model Management in Systems Biology: Challenges ā€“ Approaches ā€“ SolutionsModel Management in Systems Biology: Challenges ā€“ Approaches ā€“ Solutions
Model Management in Systems Biology: Challenges ā€“ Approaches ā€“ Solutions
Ā 

Mehr von Mike Hucka

Mehr von Mike Hucka (15)

Caltech DIBS: Digital Borrowing System
Caltech DIBS: Digital Borrowing SystemCaltech DIBS: Digital Borrowing System
Caltech DIBS: Digital Borrowing System
Ā 
Finding the right wheel
Finding the right wheelFinding the right wheel
Finding the right wheel
Ā 
Introduction to Satellite Meeting on Overview and Use of Standards and Format...
Introduction to Satellite Meeting on Overview and Use of Standards and Format...Introduction to Satellite Meeting on Overview and Use of Standards and Format...
Introduction to Satellite Meeting on Overview and Use of Standards and Format...
Ā 
What is "COMBINE"?
What is "COMBINE"?What is "COMBINE"?
What is "COMBINE"?
Ā 
Reproducibility of computational research: methods to avoid madness (Session ...
Reproducibility of computational research: methods to avoid madness (Session ...Reproducibility of computational research: methods to avoid madness (Session ...
Reproducibility of computational research: methods to avoid madness (Session ...
Ā 
Update on SBML for Tuesday Sep. 17 (COMBINE 2013)
Update on SBML for Tuesday Sep. 17 (COMBINE 2013)Update on SBML for Tuesday Sep. 17 (COMBINE 2013)
Update on SBML for Tuesday Sep. 17 (COMBINE 2013)
Ā 
Systems Biology Systems
Systems Biology SystemsSystems Biology Systems
Systems Biology Systems
Ā 
Retrospective about SBML on the occasion of the 10th Anniversary of SBML
Retrospective about SBML on the occasion of the 10th Anniversary of SBMLRetrospective about SBML on the occasion of the 10th Anniversary of SBML
Retrospective about SBML on the occasion of the 10th Anniversary of SBML
Ā 
SBML and related resources ā€Øand standardization efforts
SBML and related resources ā€Øand standardization effortsSBML and related resources ā€Øand standardization efforts
SBML and related resources ā€Øand standardization efforts
Ā 
SBML (the Systems Biology Markup Language), BioModels Database, and related r...
SBML (the Systems Biology Markup Language), BioModels Database, and related r...SBML (the Systems Biology Markup Language), BioModels Database, and related r...
SBML (the Systems Biology Markup Language), BioModels Database, and related r...
Ā 
Finding common ground between modelers and simulation software in systems bio...
Finding common ground between modelers and simulation software in systems bio...Finding common ground between modelers and simulation software in systems bio...
Finding common ground between modelers and simulation software in systems bio...
Ā 
SBML (the Systems Biology Markup Language)
SBML (the Systems Biology Markup Language)SBML (the Systems Biology Markup Language)
SBML (the Systems Biology Markup Language)
Ā 
General updates about SBML and SBML Team activities
General updates about SBML and SBML Team activitiesGeneral updates about SBML and SBML Team activities
General updates about SBML and SBML Team activities
Ā 
SBML: What Is It About?
SBML: What Is It About?SBML: What Is It About?
SBML: What Is It About?
Ā 
Software for SBML Today
Software for SBML TodaySoftware for SBML Today
Software for SBML Today
Ā 

KĆ¼rzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Ā 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
Ā 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
Ā 

KĆ¼rzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
Ā 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Ā 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Ā 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
Ā 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Ā 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Ā 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Ā 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Ā 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Ā 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
Ā 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Ā 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Ā 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Ā 
Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹
Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹
Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹
Ā 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Ā 

Creating a new language to support open innovation

  • 1. Creating a new language to support open innovation Michael Hucka, Ph.D. Department of Computing + Mathematical Sciences California Institute of Technology Pasadena, CA, USA BioBrieļ¬ng ā€“Ā BioMelbourne Network, Australia, August 2013 Email: mhucka@caltech.edu Twitter: @mhucka
  • 2. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary eļ¬€orts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  • 3. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary eļ¬€orts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  • 4. Research today: experimentation, computation, cogitation
  • 5. ā€œ The nature of systems biologyā€ Bruggeman & Westerhoļ¬€, Trends Microbiol. 15 (2007).
  • 7. Many models have traditionally been published this way Problems: ā€¢ Errors in printing ā€¢ Missing information ā€¢ Dependencies on implementation ā€¢ Outright errors ā€¢ Can be a huge eļ¬€ort to recreate Is it enough to communicate the model in a paper?
  • 8. Experiences from BioModels Database BioModels Database: ā€¢ Public database of published computational models in biology ā€¢ Many models are curated ā€“Ā i.e., made to work & annotated - If not available in electronic form, they encode it from the paper Their experiences? ā€¢ Vast majority of models encoded directly from the publication did not work as published - Often (not always) due to common errorsĀ ā€“Ā typos, omissions ā€¢ Success rate improved in recent years thanks to more people providing their models in electronic formats More is needed to makeĀ computational results reproducible
  • 9. Experiences from BioModels Database BioModels Database: ā€¢ Public database of published computational models in biology ā€¢ Many models are curated ā€“Ā i.e., made to work & annotated - If not available in electronic form, they encode it from the paper Their experiences? ā€¢ Vast majority of models encoded directly from the publication did not work as published - Often (not always) due to common errorsĀ ā€“Ā typos, omissions ā€¢ Success rate improved in recent years thanks to more people providing their models in electronic formats More is needed to makeĀ computational results reproducible http://biomodels.net/biomodels
  • 10. Experiences from BioModels Database BioModels Database: ā€¢ Public database of published computational models in biology ā€¢ Many models are curated ā€“Ā i.e., made to work & annotated - If not available in electronic form, they encode it from the paper Their experiences? ā€¢ Vast majority of models encoded directly from the publication did not work as published - Often (not always) due to common errorsĀ ā€“Ā typos, omissions ā€¢ Success rate improved in recent years thanks to more people providing their models in electronic formats More is needed to makeĀ computational results reproducible
  • 11. Is it enough to make your (software X) code available? Itā€™s vital for good science: ā€¢ Someone with access to the same software can try to run it, understand it, verify the computational results, build on them, etc. ā€¢ Opinion: you should always do this in any case
  • 12. Is it enough to make your (software X) code available? Itā€™s vital for good scienceā€” ā€¢ Someone with access to the same software can try to run it, understand it, build on it, etc. ā€¢ Opinion: you should always do this in any case But itā€™s still not ideal for communication of scientiļ¬c results: ā€¢ What if they donā€™t have access to the same software? ā€¢ What if they donā€™t want to use that software? ā€¢ What if they want to use a diļ¬€erent conceptual framework? ā€¢ And how will people be able to relate the model to other work?
  • 13. Diļ¬€erent tools diļ¬€erent interfaces & languages
  • 14. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary eļ¬€orts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  • 16. Format for representing computational models of biological processes ā€¢ Data structures + usage principles + serialization to XML ā€¢ (Mostly) Declarative, not proceduralā€”not a scripting language Neutral with respect to modeling framework ā€¢ E.g., ODE, stochastic systems, etc. Important: software reads/writes SBML, not humans SBML = Systems Biology Markup Language
  • 18. The process is central ā€¢ Literally called aā€œreactionā€in SBML ā€¢ Participants are pools of entities (biochemical species) Models can further include: ā€¢ Compartments ā€¢ Other constants & variables ā€¢ Discontinuous events ā€¢ Other, explicit math Core SBML concepts are fairly simple ā€¢ Unit deļ¬nitions ā€¢ Annotations
  • 19. SBML is now widely used Dozens of journals accept models in SBML format 100ā€™s of software tools available today 1000ā€™s of models available in SBML format today 0 100 200 300 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 254+ today
  • 20. Contents of BioModels Database Contents today: ā€¢ 142,000+ pathway models (converted from KEGG) ā€¢ 460+ hand-curated quantitative models ā€¢ 460+ non-curated quantitative models 8% 2% 3% 6% 6% 7% 8% 9% 24% 27% signal transduction metabolic process multicelullar organismal process rhythmic process cell cycle homeostatic process response to stimulus cell death localization others (e.g., developmental process) Database data from 2013
  • 21. Free software libraries ā€“Ā libSBML Reads, writes, validates SBML Can check & convert units Written in portable C++ Runs on Linux, Mac, Windows APIs for C, C++, C#, Java, Octave, Perl, Python, R, Ruby, MATLAB Well documented API Open-source (LGPL) http://sbml.org/Software/libSBML
  • 22. Free software libraries ā€“ JSBML Pure Java implementation API is compatible with libSBML but more Java-like Functionality is subset of libSBML Open source (LGPL) http://sbml.org/Software/JSBML
  • 23. Evolution of SBML continues Today: SBML Level 3 ā€¢ Level 3 Core provides framework for common models ā€¢ Level 3 packages add additional constructs to the Core
  • 24. Level 3 package What it enables Hierarchical model composition Models containing submodels āœ” Flux balance constraints Constraint-based models āœ” Qualitative models Petri net models, Boolean models āœ” Graph layout Diagrams of models āœ” Multicomponent/state species Entities w/ structure; also rule-based models draft Spatial Nonhomogeneous spatial models draft Graph rendering Diagrams of models draft Groups Arbitrary grouping of components draft Distributions Numerical values as statistical distributions in dev Arrays & sets Arrays or sets of entities in dev Dynamic structures Creation & destruction of components in dev Annotations Richer annotation syntax Status
  • 25. NationalInstituteofGeneralMedicalSciences(USA) European Molecular Biology Laboratory (EMBL) JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003) JST ERATO-SORST Program (Japan) ELIXIR (UK) Beckman Institute, Caltech (USA) Keio University (Japan) International Joint Research Program of NEDO (Japan) Japanese Ministry of Agriculture Japanese Ministry of Educ., Culture, Sports, Science and Tech. BBSRC (UK) National Science Foundation (USA) DARPA IPTO Bio-SPICE Bio-Computation Program (USA) Air Force Office of Scientific Research (USA) STRI, University of Hertfordshire (UK) Molecular Sciences Institute (USA) SBML funding sources over the past 13+ years
  • 26. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary eļ¬€orts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  • 27. Mathematical semantics Biological semantics Visual interpretation Discrete stochastic entities Continuous lumped parameter State transition Meanfieldapproximation Model type Model creation Model annotation Model analysis Numericalresults Model life-cycle Model representation level COMBINE eļ¬€orts cover diļ¬€erent facets of modeling ... ConceptduetoNicolasLeNovĆØre
  • 31. Raw models alone are insuļ¬ƒcient Need standard schemes for machine-readable annotations ā€¢ Identify entities ā€¢ Mathematical semantics ā€¢ Links to other data resources ā€¢ Authorship & pub. info Modelerswanttousetheirownconventions Low info content No standard identiļ¬ers
  • 32. Addresses 2 general areas of annotation needs: MIRIAM is not speciļ¬c to SBML MIRIAM(MinimumInformationRequestedIntheAnnotationofModels) Requirements for reference correspondence Scheme for encoding annotations Annotations for attributing model creators & sources Annotations for referring to external data resources
  • 33. Addresses 2 general areas of annotation needs: MIRIAM is not speciļ¬c to SBML MIRIAM(MinimumInformationRequestedIntheAnnotationofModels) Requirements for reference correspondence Scheme for encoding annotations Annotations for attributing model creators & sources Annotations for referring to external data resources Annotations for referring to external data resources
  • 34. Example of a problem that can be solved with annotations http://www.ebi.ac.uk/chebi Low info content
  • 35. Example of a problem that can be solved with annotations http://www.ebi.ac.uk/chebi Low info content Known by different namesĀ ā€“Ā  do you want to write all of them into your model? salicylic acid
  • 36. MIRIAM annotations for external references Goal: link model constituents to corresponding entities in bioinformatics resources (e.g., databases, controlled vocabularies) ā€¢ Supports: - Precise identiļ¬cation of model constituents - Discovery of models that concern the same thing - Comparison of model constituents between diļ¬€erent models MIRIAM approach avoids putting data content directly in the model ā€¢ Instead, it points at external resources that contain the data
  • 37. How do we create globally unique identiļ¬ers consistently? Long story shortā€”developed by the Le NovĆØre group at the EBI ā€¢ Resource identiļ¬ers (URIs) combine 2 parts: ā€¢ Thereā€™s a registry for namespaces: MIRIAM Registry - Allows people & software to use same namespace identiļ¬ers ā€¢ Thereā€™s a URI resolution service: MIRIAM Resources & identifiers.org - Allows people & software to take a given identiļ¬er and ļ¬gure out what it points to namespace entity identiļ¬er { { Identiļ¬es a dataset Identiļ¬es a datum within the dataset
  • 38. Another problem: software canā€™t read ļ¬gure legends ? BIOMD0000000319 in BioModels Database Decroly & Goldbeter, PNAS, 1982
  • 39. SED-ML = Simulation Experiment Description ML Application-independent format ā€¢Captures procedures, algorithms, parameter values Can be used for ā€¢Simulation experiments encoding parametrizations & perturbations ā€¢Simulations using more than one model and/or method ā€¢Data manipulations to produce plot(s) http://sedml.org Simulation Model Task Data generators Reports
  • 40. Eļ¬€orts like SED-ML improve reproducibility of publications Waltemath et al., BMC Sys Bio 5, 2011.
  • 41. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary eļ¬€orts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  • 42. Need interoperable formats, but developing them is not easy Need people with diverse set of knowledge & skills ā€¢ Scientiļ¬c needs ā€¢ Technical implementation skills ā€¢ Practical experience Need manage multiple phases of a standardization eļ¬€ort ā€¢ Creation ā€¢ Evolution ā€¢ Support
  • 43. Need interoperable formats, but developing them is not easy Need people with diverse set of knowledge & skills ā€¢ Scientiļ¬c needs ā€¢ Technical implementation skills ā€¢ Practical experience Need manage multiple phases of a standardization eļ¬€ort ā€¢ Creation ā€¢ Evolution ā€¢ Support } This is just for the speciļ¬cation of the standards, to say nothing of the necessary software and other infrastructure!
  • 44. Realizations about the state of aļ¬€airs in late-2000ā€™s ā€¢ Many standardization eļ¬€orts overlapped, but lacked coordination ā€¢ Eļ¬€orts were inventing their own processes from scratch ā€¢ Many individual meetings meant more travel for many people ā€¢ Limited and fragile funding didnā€™t support solid, coherent base COMBINE = Computational Modeling in Biology Network ā€¢ Coordinate standards development ā€¢ Develop common procedures & tools (but not impose them!) ā€¢ Coordinate meetings ā€¢ Provide a recognized voice Motivations for the creation of COMBINE
  • 45. Standardization eļ¬€orts represented in COMBINE today BioPAX Qualiļ¬ers GPML COMBINE Standards Associated Standardization Eļ¬€orts Related Standardization Eļ¬€orts
  • 46. Those are the products of successful, open collaborations!
  • 47. Examples of community organization Two main annual meetings, plus ad hoc workshops ā€¢ COMBINE meeting: status updates, presentations, outreach - NextĀ COMBINE: Paris, Sep 16ā€“20, 2013 ā€¢ HARMONY: Hackathon on Resources for Modeling in Biology - Software development, interoperability hacking COMBINE 2012, TorontoCOMBINE 2011, Heidelberg
  • 48. What motivates people to do this? Solving a problem for yourself/your closed group is easier and quicker ā€¢ So what are these people getting out of it? Some advantages of an open, community-oriented approach: ā€¢ Finding better solutions they wouldnā€™t ļ¬nd alone - Arguments Discussions leads to realizations & better solutions ā€¢ Contributions to science ā€“Ā publications, peer recognition ā€¢ Support of a standard makes their software more desirable ā€¢ Sense of community involvement Some admitted disadvantages: ā€¢ Agreement takes time ā€“Ā progress can be very slow ā€¢ Solutions may include features you didnā€™t plan on, or need
  • 49. COMBINE is open to allā€”and COMBINE needs you! http://co.mbine.org Current coordinators: ā€¢ Nicolas Le NovĆØre, Mike Hucka, Falk Schreiber, Gary Bader
  • 50. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary eļ¬€orts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  • 51. Time it well ā€¢ Too early and too late are bad Start with actual stakeholders ā€¢ Address real needs, not perceived ones Start with small team of dedicated developers ā€¢ Can work faster, more focused; also avoidsā€œdesigned-by-committeeā€ Engage people constantly, in many ways ā€¢ Electronic forums, email, electronic voting, surveys, hackathons Make the results free and open-source ā€¢ Makes people comfortable knowing it will always be available Be creative about seeking funding Some things we (maybe?) got right with SBML
  • 52. Not waiting for implementations before freezing speciļ¬cations ā€¢ Sometimes ļ¬nalized speciļ¬cation before implementations tested it - Especially bad when we failed to do a good job ā€£ E.g.,ā€œforward thinkingā€features, orā€œelegantā€designs Not formalizing the development process suļ¬ƒciently ā€¢ Especially early in the history, did not have a very open process Not resolving intellectual property issues from the beginning ā€¢ Industrial users askā€œwho has the right to give any rights to this?ā€ Some things we certainly got wrong
  • 53. Nicolas Le NovĆØre, Henning Hermjakob, Camille Laibe, Chen Li, Lukas Endler, Nico Rodriguez, Marco Donizelli,Viji Chelliah, MĆ©lanie Courtot, Harish Dharuri Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010 John C. Doyle, Hiroaki Kitano Mike Hucka, Sarah Keating, Frank Bergmann, Lucian Smith, Andrew Finney, Herbert Sauro, Hamid Bolouri, Ben Bornstein, Bruce Shapiro, Akira Funahashi, Akiya Juraku, Ben Kovitz OriginalPIā€™s: SBMLTeam: SBMLEditors: BioModelsDB: Mike Hucka, Nicolas Le NovĆØre, Sarah Keating, Frank Bergmann, Lucian Smith, Chris Myers, Stefan Hoops, Sven Sahle, James Schaff, DarrenWilkinson And a huge thanks to many others in the COMBINE community This work was made possibleĀ thanks to a great community
  • 54. SBML http://sbml.org BioModels Database http://biomodels.net/biomodels MIRIAM http://biomodels.net/miriam identiļ¬ers.org http://identiļ¬ers.org SED-ML http://biomodels.net/sed-ml SBO http://biomodels.net/sbo SBGN http://sbgn.org COMBINE http://co.mbine.org URLs
  • 55. Iā€™d like your feedback! You can use this anonymous form: http://tinyurl.com/mhuckafeedback