How to Troubleshoot Apps for the Modern Connected Worker
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
1. ISMB hashtag: #PP44
Highlights Track: Databases and Ontologies
Toward interoperable bioscience data
Susanna-Assunta Sansone, PhD
Principal Investigator, Team Leader,
University of Oxford e-Research Centre, Oxford, UK
@isatools
@biosharing
ISMB 2012, Long Beach, California, USA, July 15-17
2. ISMB tag:
What is this presentation about? #PP44
§ ISA Commons, a grass-root collaborative that works to facilitate
collection, curation and sharing of experiments in an
increasingly diverse set of life science domains, using a common,
structured representation of the experiments that
• transcends individual biological and technological domains,
• follows the appropriate community norms and standards, many
listed in the BioSharing catalogue and
• is implemented by several curation, storage and data sharing tools
TOWARDS INTEROPERABLE BIOSCIENCE DATA doi:10.1038/ng.1054
Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann
S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B,
Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S,
Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland
L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A,
Feb 2012
Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B,
www.biosharing.org www.isacommons.org
Wolstencroft K, Xenarios J, Hide W.
www.isacommons.org
3. ISMB tag:
From reusable data to reproducible research #PP44
To make the datasets comprehensible, interoperable and reusable,
underpinning future investigations, we need common ways to report and
share the experimental details and the associated results.
Consistent reporting will have a positive and long-lasting impact on the
value of collective scientific outputs.
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
4. ISMB tag:
Structured description of datasets #PP44
§ Capture all salient features
of the experimental workflow
§ Make annotation explicit and
discoverable
§ Structure the descriptions
for consistency, tracking
§ independent variables
§ dependent variables
using
§ cross reference and
resolvable identifiers
5. ISMB tag:
Not too much, not too little, just ‘right’ #PP44
§ We must strike a balance
between
• depth and breadth of
information; and
• sufficient information
required to reuse the data
6. experimental design
sample characteristic(s)
experimental variable(s)
technology(s)
measurement(s)
protocols(s)
data file(s)
......
Example of experiments by
InnoMed PredTox
6 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 a FP6 public-private consortium
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
7. ISMB tag:
A ‘general mobilization’ to develop standards, e.g.: #PP44
use the same word and
allow data to flow from report the same core,
refer to the same ‘thing’
one system to another essential information
Challenges: different communities, different norms and standards,
lack of coordination, fragmentation and uneven coverage…
8. ISMB tag:
Growing number of reporting standards #PP44
+ 303
Each one focuses on a particular biological or technological domains
+ 150
+ 130
Source: MIBBI,
Source: BioPortal
EQUATOR
Estimated
MAGE-Tab! AAO! MIAME!
GCDML! MIAPA!
CHEBI!
SRAxml! OBI! MIRIAM!
VO!
SOFT! MIQAS!
FASTA! PATO! MIX!
CML! ENVO! REMARK!
DICOM! MIGEN!
GELML! MOD!
SBRML! MIAPE! MIQE!
TEDDY!
MITAB! MzML! XAO! CIMR! CONSORT!
BTO!
ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
9. A catalogue to map the
landscape of standards :
over 400 bio-standards
(public and in curation)
Field*, Sansone* et al., Omics data sharing. Science
9 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
326, 234-36 (2009) doi:0.1126/science.1180598
www.ebi.ac.uk/net-project
10. ISMB tag:
Example of multi-assays study – how many #PP44
‘standards’ are applicable to this?
11. ISMB tag:
Example of multi-assays study – how many #PP44
‘standards’ are applicable to this?
12. ISMB tag:
Example of multi-assays study – how many #PP44
‘standards’ are applicable to this?
13. ISMB tag:
Example of multi-assays study – how many #PP44
‘standards’ are applicable to this?
14. ISMB tag:
#PP44
user community
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
15. ISMB tag:
#PP44
Metadata tracking framework, designed to
support the use us several standards
checklists, terminologies conversions to
(a growing number of) other metadata
formats, used by public repositories, e.g.
MAGE-Tab Pride-xml
SRA-xml SOFT
Currently finalizing conversion to RDF to
explore the growing Linked Data universe,
in collaboration with the W3C HCLSIG)
16. ISMB tag:
#PP44
ISA software suite: supporting standards-compliant experimental
annotation and enabling curation at the community level
(Rocca-Serra et al, 2010)
a collaborative effort of international research/service groups:
University of Oxford, EBI, Harvard School of Public Health, NERC Environmental
Bioinformatics Centre, Genomic Standards Consortium, US FDA Center for
Bioinformatics, Leibniz Institute of Plant Biochemistry and more….
17. ISMB tag:
#PP44
To mint DOIs
17 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
empowering researchers to use standards
18. ISMB tag:
#PP44
Maguire E, Rocca-Serra P, Sansone SA, Davies J and Chen M.
Taxonomy-based Glyph Design -- with a Case Study on Visualizing
Workflows of Biological Experiments,
IEEE Transactions on Visualization and Computer Graphics, volume 18, 2012
(in press)
19. ISMB tag:
#PP44
Ontology Search and Tagging in Google Spreadsheets
20. ISMB tag:
#PP44
Ontology Search and Tagging in Google Spreadsheets
21. A growing ecosystem of over 30 public and internal resources
using the ISA metadata tracking framework to facilitate standards-
compliant collection, curation, management and reuse of
investigations in an increasingly diverse set of life science domains,
including:
• environmental health • stem cell discovery
• environmental genomics • system biology
• metabolomics • transcriptomics
• metagenomics • toxicogenomics
• nanotechnology • also by communities working to build
• proteomics, a library of cellular signatures
We aim to achieve a common
representation of experimental content that
transcends individual bioscience domains
22. A growing ecosystem of over 30 public and internal resources
using the ISA metadata tracking framework to facilitate standards-
compliant collection, curation, management and reuse of
investigations in an increasingly diverse set of life science domains,
including:
• environmental health • stem cell discovery
• environmental genomics • system biology
• metabolomics • transcriptomics
• metagenomics • toxicogenomics
• nanotechnology • also by communities working to build
• proteomics a library of cellular signatures
Some of the public groups/resources: Some of the internal projects:
Stem Cell Commons
Nanotechnology
Informatics Working
Group
25. ISMB tag:
Implementations at Harvard #PP44
data sharing
in ISA-Tab
Importance of a local community
26. ISMB tag:
Implementation at the EBI #PP44
26 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
28. Extensions
Nanotechnology
Informatics Working Group
28 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
30. TOWARDS INTEROPERABLE BIOSCIENCE DATA doi:10.1038/ng.1054
Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann
S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B,
Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S,
Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland
L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A,
Feb 2012
Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B,
www.biosharing.org www.isacommons.org
Wolstencroft K, Xenarios J, Hide W. www.isacommons.org
Community involvement and uptake!
1st ISA-Tab workshop! 3rd ISA-Tab workshop! User workshops/visits - start! 1st public instance: !
2nd ISA-Tab workshop! Other tools implement ! Harvard Stem Cell ! Growing number of
ISA-Tab! Discovery Engine! systems starts to adopt
ISA framework!
Core developments!
Conversions to ! Links to
Pride-XML/SRA-XML/! analysis tools
Strawman ISA-Tab spec! ISA software v1! MAGE-Tab and more! starts!
Final ISA-Tab spec! Database instance !
at EBI! RDF format starts!
Publications!
Stem Cell !
ISA-Tab and ! Discovery ! ISA Commons!
Omics data sharing!
Workshop reports! ISA software suite! Engine!
(Science)! (Nature Genetics)!
(Bioinformatics)! (NAR)!
2007 2008 2009 2010 2011 2012
Development timeline