Weitere ähnliche Inhalte Ähnlich wie SEMS: Model search and ranked Retrieval (Ron Henkel) (20) Mehr von University Medicine Greifswald (20) Kürzlich hochgeladen (20) SEMS: Model search and ranked Retrieval (Ron Henkel)1. Graph based storage and retrieval of
computational models
Ron Henkel, Martin Scharm, Dagmar Waltemath, Olaf Wolkenhauer
Department of Systems Biology and Bioinformatics
University of Rostock
www.sbi.uni-rostock.de
29.11.2012 © 2009 UNIVERSITÄT ROSTOCK
2. Motivation
1000 120000
900
100000
800
700
80000
Number of Annotations
Number of Models
600
500 60000
400
40000
300
200
20000
100
0 0 Models
Apr Jul Okt Jan Apr Jul Okt Jan Apr Jul Okt Jan Apr Jul Okt Jan Apr Jul Okt Jan Apr Jul Okt Jan Apr Jul Okt Jan Apr Jul
Annotation
05 05 05 06 06 06 06 07 07 07 07 08 08 08 08 09 09 09 09 10 10 10 10 11 11 11 11 12 12 12
Data from BioModels Database
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 2
3. Motivation
• Models:
Grow in number and complexity
Are provided with supplementary material
Evolve over time
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 3
4. State of the Art
• Storage:
Relational Databases
Model files on Hard Disk Drive (HDD)
Additional files (images, result sets, paper)
• Search:
SQL statements
Facetted search
Data browsing
29.11.2012 © 2009 UNIVERSITÄT ROSTOCK 4
5. State of the Art - Demo
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 5
6. Available Data for Ranked Retrieval
Model file Annotation & Ontologies A model‘s network
• Constituent names • Biochemical background • Model structure
• Model code • Synonyms • Aggregate values
29.01.2012 © 2009 UNIVERSITÄT ROSTOCK 6
7. Available Data for Ranked Retrieval
# aspect importance contained features
1 Administrative none ids, file name, version, formalism…
2 Person medium creator, encoder, submitter, publication author
3 Dates low creation and modification date
4 Publication high title, abstract, full-text, journal
5 Constituents very high compartment, species, reaction
6 User content very high keywords, tags, remarks, changes
• The concept is abstract and can be applied to different model formalisms.
• Depending on the formalism the aspects can be refined into features.
• The model constituents also contain the annotations.
Henkel et al. (2010) BMC Bioinf
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 7
8. Biomodels Database – A Test Case
• Apache Lucene Framework
• Model Index
425 models, 140.977 terms
• Semantic Index
2261 URIs, 409.124 terms
http://www.ebi.ac.uk/biomodels-demo/
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 8
10. Improvements
• Ranking
• Enhanced query possibilities
Required, optional and excluded criteria
Allow full-text and Ontology queries
• Example: “Find cell cycle models”
Query BiomodelsDB Using IR Gold Standard
cell cycle 135 173 n/a
“cell cycle” 14 26 28
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 10
11. Available Data for Ranked Retrieval
Model based Annotation & Ontologies A model‘s network
• Model name • Biochemical background • Include model structure
• Model code • Allows to identify e.g. synonyms • Aggregate values
29.11,2012 © 2009 UNIVERSITÄT ROSTOCK 11
12. Mapping a Model to a Database
A model‘s network
• Include model structure
• Aggregate values
29.11.2012 © 2009 UNIVERSITÄT ROSTOCK 12
13. Advantages of Graph Databases
• Easy mapping of model structure
• Fast browsing through models
• Flexible and schema-free storage
• Easy linking to models, simulation setups or results,
and external resources
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 13
14. Document
Model
R P S E C
asProduct
asReactant
asModifier
isEncodedBy
isVersionOf
is
is
is
uniprot:P0710 uniprot:Q0339
SBO:0000268 HGNC:8582 GO:0005737
1 3
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 14
16. Preliminary Results
• All models stored in Biomodels DB were stored into the
graph database
• Implemented storage and search in Jummp
official demo release upcoming
• Added 140.811 models from path2models project
done, but including annotation blows the memory
database scales well and is reasonably fast
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 16
18. Future Work: Relate model versions
• Link successor and predecessor
• Relate changed entities
• Store the diff
• Enable version control for multi-
document models
• Propagate changes for imported models
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 18
19. SEMS: Methods for Model & Simulation Management
Model Version control Model Storage Model Search
• XML version control • Relational databases • Ranked model retrieval
Waltemath et al., 2011 (DBSpektrum) Henkel et al., 2010 (BMC Bioinf)
• Difference detection in XML
Waltemath et al., submitted • Graph-based storage • Structure- and
Henkel et al., 2012 (INFORMATIK) ontology-based search
Simulation VC Simulation Storage SimulationSearch
• Standardized encoding of simulation setups Waltemath et al., 2011 (BMC SysBiol)
• Linking models and simulation descriptions Henkel et al., 2012 (INFORMATIK)
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 19
20. Take Home Message
• Ranked retrieval is a necessary feature for model
databases.
• The model’s inherent structure should be queryable.
• Graph based storage reflects well a model‘s encoding
and evolution.
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 20
21. Thanks for your attention.
Questions?
ron.henkel@uni-rostock.de
11.12.2012 © 2009 UNIVERSITÄT ROSTOCK 21