SlideShare ist ein Scribd-Unternehmen logo
1 von 122
Downloaden Sie, um offline zu lesen
The buzz around reproducible bioscience data:
the policies, the communities and the standards

            Susanna-Assunta Sansone, PhD
            Principal Investigator and Team Leader,
     University of Oxford e-Research Centre, Oxford, UK

                           Slides at:
          http://www.slideshare.net/SusannaSansone



    SPSAS e-SciBioEnergy Sao Paolo School of Advanced Science on
   e-Science for Bioenergy Research, 22-26 Oct, 2012, Campinas, Brazil
Lab scientist!




                 Data scientist!




                   Team Leader!    Consultant!
Oxford e-Research Centre
Oxford e-Research Centre
Oxford e-Research Centre



             Providing research
             computing, high-
             performance
             computing
                      Integrating with
                      national and
                      international
                      infrastructure

             Supporting leading
             edge facilities through
             education and training
Oxford e-Research Centre


          Collaborating with European and wider
          international groups in, e.g.:
               •  energy,
               •  radio astronomy,
               •  biological data federation,
               •  life sciences simulation,
               •  biodiversity,
               •  computational chemistry,
               •  neuroscience,
               •  digital humanities tools,
               •  digital music analysis

          Research in
             •  computation,
             •  data infrastructure and analysis,
             •  visualisation
My team’s activities and groups we work with
      data management and biocuration, collaborative development
           of software and database, standards and ontology

•    environmental genomics                      •    stem cell discovery
•    metabolomics                                •    system biology
•    metagenomics                                •    transcriptomics
•    nanotechnology                              •    toxicogenomics
•    proteomics                                  •    environmental health




        env	
                                                           agro	
  




              tox/pharma	
                                 health	
  
http://www.flickr.com/photos/12308429@N03/4957994485/   CC BY
Outline



        “The buzz around reproducible bioscience data:
       the policies, the communities and the standards”


“The reality from the buzz:
how to deliver reproducible bioscience data”
Preserve
    institutional /
      corporate
       memory
Harmonize collection across sites
    Find matching studies
     Data dissemination
  Long-term data stewardship
                                    10
Utilize
public data

Identify suitable data
       Retrieve
Curate and harmonize
     Re-analyze


                         11
Address
reproducibility /
     reuse
 of public data


                    12
Address
reproducibility /
     reuse
 of public data


                    13
Address
reproducibility /
     reuse
 of public data

                    Ioannidis et al., Repeatability of published microarray
                    gene expression analyses. Nature Genetics 41(2),
                              14
                    149-55 (2009) doi:10.1038/ng.295
Address
reproducibility /
     reuse
 of public data
                          15



                    15
Address
reproducibility /
     reuse
 of public data

                          16



                    16
Address
reproducibility /
     reuse
 of public data
                          17



                    17
http://www.flickr.com/photos/notbrucelee/8016189356/   CC BY
COMPREHENSIBLE
    INTEROPERABLE
    REPRODUCIBLE
       REUSABLE




http://www.flickr.com/photos/notbrucelee/8016189356/   CC BY
Growing, worldwide movement for reproducible research




     Shared, annotated research data and methods offer new discovery
         opportunities and prevent unnecessary repetition of work.
             Improved data sharing underpins science of the future

                                 “Publicly-funded research data are a public good,
                                    produced in the public interest”
                                 “Publicly-funded research data should be openly available
20
                                    to the maximum extent possible”
      The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
Growing, worldwide movement for reproducible research

 esoteric formats                                           comprehensible?
 lack of sufficient                                          interoperable?
     contextual
    information                                                reusable?
 hoc or proprietary
    terminology                                               reproducible?



§  Researchers and bioinformaticians in both academic and commercial
    science, along with funding agencies and publishers, embrace the
    concept that community-developed standards are pivotal to structure
    and enrich the annotation of
         •  entities of interest (e.g., genes, metabolites, phenotypes) and
         •  experimental steps (e.g., provenance of study materials,
            technology and measurement types)
Structure and enrich description of the experiments
§  Describe and communicate the information in an unambiguous,
    human and machine readable manner

    Seven week old C57BL/6N mice were treated
    with low-fat diet.
    Liver was dissected out, RNA prepared…etc.

         Age value
         Unit
         Strain name
         Subject of the experiment
         Type of diet and         Type of protocol - sample treatment
         experimental condition   Type of protocol - nucleic acid extraction
         Anatomy part
Structure and enrich description of the experiments
§  Describe and communicate the information in an unambiguous,
    human and machine readable manner




                                                         Figure: credit to
                                                         OBI consortium
Reproducible &
     Reusable
Bioscience Research
reasoning visualization
analysis browsing integration
    exchange retrieval



     Well-annotated &
     Structured Data


    Reproducible &
       Reusable
  Bioscience Research
reasoning visualization
            analysis browsing integration
                exchange retrieval

Community                                   Software
Standards                                    Tools
                 Well-annotated &
                 Structured Data


                Reproducible &
                   Reusable
              Bioscience Research
http://www.flickr.com/photos/lamerentertainment/1581770980/sizes/m/in/photostream/
Today’s bioscience research
                                              Publications
  Experimental
      and
 computational
     data




§  Is interdisciplinary and integrative in character
    •  need to deal with new and existing datasets
    •  deal with a variety of data types
§  ‘How the organism works’ is the focus
    •    Twenty years ago data was the center
                                               Source of the figure: EBI website
29   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                                Source: http://ebbailey.wordpress.com
                                                                                    www.ebi.ac.uk/net-project
Example from the toxicogenomics domain

                        Study looking at the effect of a
                        compound inducing liver damage
                        by characterizing/measuring
                        - the metabolic profile by MS and
                        NMR
                        - protein expression in liver by MS
                        - gene expression by DNA
                        microarray
                        -  conducting genetic and
                        phenotypical analysis
                        Information contributing to the
                        construction and validation of
                        system biology models
Example of experiments by
                                                                                                     InnoMed PredTox
31   The International Conference on Systems Biology (ICSB), 22-28 August, 2008                      a FP6 public-private consortium
                                                                                  Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Structured description of datasets




                       §  Capture all salient features
                           of the experimental workflow
                       §  Make annotation explicit and
                           discoverable
                       §  Structure the descriptions
                           for consistency, tracking
                            §  independent variables
                            §  dependent variables
                            using
                            §  cross reference and
                                resolvable identifiers
Not too much, not too little, just ‘right’




                          §  We must strike a balance
                              between
                               •  depth and breadth of
                                  information; and
                               •  sufficient information
                                  required to reuse the data
Information intensive experiments
Information intensive experiments


                     To make the experiments
                     comprehensible and reusable,
                     underpinning future
                     investigations, we need
                     common ways to report and
                     share the experimental details
                     and the associated data.

                     Consistent reporting will have a
                     positive and long-lasting impact
                     on the value of collective
                     scientific outputs.
Common ways to report and share

§ The challenges we face
  •    Large in volume: lots of data types and metadata!
  •    Lots of free text descriptions: hard to mine, subject to mistakes!
  •    Babel of terminologies: lack of definitions, hard to map!
  •    Heterogeneous file formats: software lock-in!
§ Need for reporting standards
  •  Minimal reporting descriptors
       - Report the same ‘core essentials’
  •  Controlled vocabularies or ontology
       - Use the same word and mean the same thing
  •  Common exchange formats
       - Make tools interoperable, allow data exchange and integration
Reporting standards – the benefits

§  Describe and communicate the information to others, in an
    unambiguous manner
§  To unlock the value in the data
   •  Compare, query and evaluate data
       - Facilitate scientific validation of the findings
   •  Understand variability within/between different technologies and
      protocols
       -  Facilitate technical validation
       -  Enable optimization of the experimental designs
       -  Identify critical checkpoints and develop quality metrics
§  To define submission and/or publication requirements
   •  Journals
   •  Databases
§  To ensure data integrity, reproducibility and (re)use
Escalating number of standardization efforts in bioscience,
                          e.g.:
                                                         Genomics Standards
Genome annotation                                         Consortium (GSC)
www.geneontology.org                                         gensc.org


  Functional                                                  Enzymology data
Genomics Data                                                    standards
Society (FGED)                                                 www.strenda.org
 www.fged.org

       HUPO- Proteomics
    Standards Initiative (PSI)                                   Systems modelling
      http://www.psidev.info                                         standards
                                                                  www.co.mbine.org
    Cheminformatics
   www.ebi.ac.uk/chebi
                                   Pathways
                                 www.biopax.org


                   Metabolomics Standards Initiative (MSI)
                      http://www.metabolomicssociety.org
Different community, different norms and standards, e.g.:




                                  use the same word and
        allow data to flow from                               report the same core,
                                  refer to the same ‘thing’
        one system to another                                 essential information




                        Challenges:
lack of coordination, fragmentation and uneven coverage
Is this ‘general mobilization’ good or bad?




                                      use the same word and
            allow data to flow from                               report the same core,
                                      refer to the same ‘thing’
            one system to another                                 essential information


§  Difference in structures and processes:
     •  organization types (open, close to members, society, WG…)
    •  standards development (how to design, develop, evaluate, maintain…)
    •  adoption, uptake, outreach (link to journals, funders, commercial sector…)
    •  funds (sponsors, memberships, grants, volunteering…)
Is this ‘general mobilization’ good or bad?




                                        use the same word and
              allow data to flow from                               report the same core,
                                        refer to the same ‘thing’
              one system to another                                 essential information


§  Fragmentation of the standards is a major issue
     •  Being focused on particular communities’ interests, be their individual
        technologies or biological/biomedical disciplines, leads to duplication of effort,
        and more seriously, the development of (largely arbitrarily) different standards
     •  This severely hinders the interoperability of databases and tools and ultimately
        the integration of datasets
Fragmentation of the databases and data, e.g.




                Access
                Storage
                Submission




 Three EBI
omics systems
Fragmentation of the databases and data, e.g.




                Access
                Storage
                Submission




 Three EBI
omics systems
Fragmentation of the databases and data, e.g.




                Access
                Storage
                Submission




 Three EBI
omics systems
Fragmentation of the databases and data, e.g.




                             Access
DIFFERENT
Download formats




DIFFERENT
- Core requirements
                             Storage
represented
- Representation of the
studies and related
samples
- Curation practices


DIFFERENT
                             Submission




Formats, terminologies and
tools



  Three EBI
 omics systems
To integrate data we need interoperable standards
                         epidemiology
plant biology                                         microbiology




                                                                        Biologically-delineated
                                                                        views of the world


                                                                     Generic features ( common core )
                                                                     - description of source biomaterial
                                                                     - experimental design components

                             MS            MS

                Arrays         Gels             NMR                      Technologically-delineated
                          Columns           FTIR
                                                                         views of the world
       Scanning             Arrays &
                            Scanning               Columns



transcriptomics                                       metabolomics
                         transcriptomics
Need to address the fragmentation

§  Promote synergies
   •  Among basic academic (omics) research but also regulatory- or
      healthcare-driven initiatives
§  Much could be learned from exchange of ideas and practices
   •  Although, regulatory- or healthcare-driven initiatives have far stricter
      guidelines
   •  Although, often SDOs have ‘close’ discussions, require membership
§  Create interoperable standards
   •  Fit neatly into a jigsaw, resolving inconsistency and filling gaps
§  Overcome several barriers
   •  Technical
   •  Funding issue
   •  Sociological......
Eloquent quotes


      “Biologists would rather share their toothbrush
      than their gene name”
      Michael Ashburner, Professor Genetics,
      University of Cambridge, UK



     “Any customer can have a car painted any
     colour that he wants so long as it is black”

     Henry Ford, you know who he is…


48   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Standards – an old issue, e.g. engineering in 1850

§  Buying nuts and bolts is easy today
     •  But in the 19th century it was very complicated!
Standards – an old issue, e.g. engineering in 1850

§  Buying nuts and bolts is easy today
     •  But in the 19th century it was very complicated!
§  Nuts and bolts were custom made
    •  Products from different shops were incompatible
    •  Craftsmen liked the monopoly
         - Customers were ‘locked in’ !!
§  In 1864 William Sellers initiated the standardization
    •  Mass production
    •  Get interchangeable parts
    •  Standardized way to make nuts and bolts
§  Generally adopted only after WWII, though …. !!
Social engeneering




51   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Ownership of open standards
                                          can be problematic in broad,
                                           grass-root collaborations; it
                                          requires improved models, to
                                        encourage maintenance of and
                                         contributions to these efforts,
                                           supporting their evolutions




52   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
The extensive community
                                         liaison needs to be managed
                                            and funded; rewards and
                                        incentives need to be identified
                                               for all contributors




53   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
The cost of implementing a
                                           standards-supported data
                                        sharing vision is as large as the
                                         number of stakeholders that
                                         must operate synchronously




54   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
1. Funders actively developing data policies




§  Several data preservation, management and sharing policies have
    emerged in response to increased funding for omics domains
§  Even if in general terms, standards are recognized as necessary ‘tools’ to
    unambiguously represent, describe and communicate research data
2. Similar trend in the regulatory arena




§  “… lack of standardized data affects CDER’s review processes by curtailing a
    reviewer’s ability to perform integral tasks such as rapid acquisition, storage,
    analysis......efficient management of a portfolio of standards projects will
    require coordinated efforts and clear roles for multiple participants within/outside
    FDA”
3. Publishes have become strong advocators




§  Continue to support the development of open standards and tools
     •  to support sharing of sufficiently well annotated datasets
59   •  to enable comprehensible, reusable, www.ebi.ac.uk/net-project research
                                             reproducible
     The International Conference on Systems Biology (ICSB), 22-28 August, 2008
                                          Susanna-Assunta Sansone
….the rise of data-driven journals, e.g.:




                                        partnering with:
The rise of data-driven journals, e.g.:




                                          partnering with:
4. Similar trend in the commercial sector




§  R&D has invested heavily in procedures and tools that integrate external
    information with their own data to enhance the decision-making process
•  Now joining forces to streamline non-competitive elements of the life
    science workflow by the specification of common standards, business
    terms, relationships and processes
....their information landscape is evolving


     Yesterday                                Today                                         Tomorrow
                                                                                                         Proprietary
                                                                           Public                         content
                                                                          content                         provider
                                                                          provider

        Big Life
        Science                                 Big Life                 CRO
                                                                                                                  Academic
       Company                                  Science
                                                                                                                  group
                                               Company
                                                                        Regulatory
                                                                        authorities

                                                                             Service provider
                                                                                                Software vendor

               Yesterday                     Today                         Tomorrow
Innovation     Innovation inside             Searching for Innovation      Heterogeneity of collaborations; part of
                                                                           the wider ecosystem
Model
IT             Internal apps & data          Struggling with change        Cloud, services
                                             security and trust

Data           Mostly inside                 In and out                    Distributed

Portfolio      Internally driven and owned   Partially shared              Shared portfolio



                                                                                      Credit to: Pistoia Alliance
http://www.flickr.com/photos/idiolector/289490834/   CC BY
Take home messages



      “The buzz around reproducible bioscience data:
     the policies, the communities and the standards”



u  Contribute to the reproducible research movement

u  Learn about open community-standards in your area

u  Consider data science as a career path
Outline



        “The buzz around reproducible bioscience data:
       the policies, the communities and the standards”


“The reality from the buzz:
how to deliver reproducible bioscience data”
How do we achieve this? Is it possible to achieve a common,
  structured representation of diverse bioscience experiments
  that:
  •        “The buzz around reproducible bioscience data:
       follows the appropriate community standards and
                COMPREHENSIBLE
  •      the policies, E R Ocommunities research?standards”
       delivers    I N T the P E R A B L E and the
                  REPRODUCIBLE
                    REUSABLE

“The reality from the buzz:
how to deliver reproducible bioscience data”
Growing number of reporting standards




                       MAGE-Tab!     AAO!            miame!
                     GCDML!                               MIAPA!
                                        CHEBI!
                       SRAxml!       OBI!            MIRIAM!
                                          VO!
             SOFT!                                            MIQAS!
                   FASTA!          PATO!                MIX!
      CML!                                  ENVO!                    REMARK!
               DICOM!                                      MIGEN!
     GELML!                         MOD!
                 SBRML!                                 MIAPE!     MIQE!
                                        TEDDY!
 MITAB!     MzML!                XAO!                         CIMR! CONSORT!
                                             BTO!
ISA-Tab! SEDML…!             DO     PRO!     IDO…!          MIASE! MISFISHIE….!
Growing number of reporting standards
                                                      + 303




                                                                                    + 150
                          + 130




                                                                                            Source: MIBBI,
                                                              Source: BioPortal




                                                                                                   EQUATOR
                                  Estimated




                                                                                                               Databases,
                                                                                                               annotation,
                                                                                                                curation
                                                                                                                  tools
                       MAGE-Tab!                AAO!                              miame!
                     GCDML!                                                            MIAPA!
                                                   CHEBI!
                       SRAxml!                  OBI!                              MIRIAM!
                                                     VO!
             SOFT!                                                                          MIQAS!
                   FASTA!                     PATO!                                   MIX!
      CML!                                              ENVO!                                      REMARK!
               DICOM!                                                                    MIGEN!
     GELML!                                    MOD!
                 SBRML!                                                               MIAPE!                 MIQE!
                                                     TEDDY!
 MITAB!     MzML!                             XAO!                                            CIMR! CONSORT!
                                                          BTO!
ISA-Tab! SEDML…!             DO                PRO!       IDO…!                             MIASE! MISFISHIE….!
But how much do we know about these standards




                       MAGE-Tab!     AAO!            miame!
                     GCDML!                               MIAPA!
                                        CHEBI!
                       SRAxml!       OBI!            MIRIAM!
                                          VO!
             SOFT!                                            MIQAS!
                   FASTA!          PATO!                MIX!
      CML!                                  ENVO!                    REMARK!
               DICOM!                                      MIGEN!
     GELML!                         MOD!
                 SBRML!                                 MIAPE!     MIQE!
                                        TEDDY!
 MITAB!     MzML!                XAO!                         CIMR! CONSORT!
                                             BTO!
ISA-Tab! SEDML…!             DO     PRO!     IDO…!          MIASE! MISFISHIE….!
But how much do we know about these standards
            Which tools and     I use high throughput
              databases       sequencing technologies,
           implement which    which one are applicable
              standards?                to me?

                                            How can I get
    What are the
                                             involved to
criteria to evaluate
                                               propose
 their status and
                                            extensions or
       value?
                                            modifications?



          Which one are              I work on plants,
         mature enough for           are these just for
           me to use or                 biomedical
           recommend?                  applications?
But how much do we know about these standards

§  A bewildering array of standards is available, but
   •  these are hard to find, at different levels of maturity; in
      some areas duplications or gaps in coverage also exist

§  Standards are just a ‘means to an end’, therefore
   •  we want to make them discoverable and accessible,
      maximizing their use to assist the virtuous data cycle,
      from generation to standardization through publication to
      subsequent sharing and reuse
(2007) Vol 25 No 11




obofoundry.org
Towards Lego-like ontologies

 §  Compound terms should be formed out of simpler constituents:
     •  Body weight
         weight (quality ontology, PATO)
             that
                  inheres_in (relation ontology, RO)
                           whole_organism (anatomy ontology, CARO)

      •  Xylene contaminated soil
          soil (environmental ontology, EnvO)
               that
                    has_contaminated (relation ontology, RO)
                                                 xylene (chemical ontology, ChEBI)


76   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
(2008) Vol 26 No 8




mibbi.og
§ Serves researchers, biocurators, journal editors
and reviewers, and funders to
   §  discover checklists for a particular domain
   §  monitor progress of extant efforts
   §  facilitate collaborations
Science
                        (2009),Vol 326, 234-236




http://biosharing.org
A catalogue to map the
                                                                                  landscape of standards and the
                                                                                  systems implementing them:
                                                                                  Over 400 bio-standards
                                                                                  (public and in curation)
                                                                                        Field*, Sansone* et al., Omics data sharing. Science
80   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                        326, 234-36 (2009) doi:0.1126/science.1180598
                                                                                    www.ebi.ac.uk/net-project
•    A coherent, curated and searchable catalogue of data sharing resources
•    Bioscience standards and associated data-sharing policies, publications, tools and databases
•    Assessment criteria for usability and popularity of standards
•    Relationships among standards
•    Encouragement for communication & interaction among groups
•    Promoting interoperability & informed decisions about standards
Smith et al, 2007




The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                               www.ebi.ac.uk/net-project
Smith et al, 2007




Taylor, Field, Sansone et al, 2008

    The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                   www.ebi.ac.uk/net-project
List of databases, linked to standards a collaboration with                                                 Database Issue




84   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
List of databases, linked to standards a collaboration with                                                 Database Issue




85   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
List of databases, linked to standards a collaboration with                                                 Database Issue




86   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
Major challenge: define ‘relations’ among standards




                                                                                                                CREDIT:
 The relationship among popular standard formats for pathway information                                        Demir, et al., The BioPAX
 BioPAX and PSI-MI are designed for data exchange to and from databases and                                     community standard for
 pathway and network data integration. SBML and CellML are designed to                                          pathway data sharing,
 support mathematical simulations of biological systems and SBGN represents                                     2010.
 pathway diagrams.
87   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Example of multi-assays study – how many ‘standards’
                are applicable to this?
Example of multi-assays study – how many ‘standards’
                are applicable to this?
Example of multi-assays study – how many ‘standards’
                are applicable to this?
Example of multi-assays study – how many ‘standards’
                are applicable to this?
An exemplar approach to the status quo

§  A grass-root collaborative that works to facilitate collection, curation
   and sharing of experiments using a common, structured representation
   of the experiments that
    •  transcends individual biological and technological domains and
    •  can be ‘configured’ to implement (several of) the community
        standards



              TOWARDS INTEROPERABLE BIOSCIENCE DATA                            doi:10.1038/ng.1054

              Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann
              S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B,
              Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S,
              Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland
              L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A,
 Feb 2012
              Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B,
www.biosharing.org                                                   www.isacommons.org
              Wolstencroft K, Xenarios J, Hide W.
                                                            www.isacommons.org
An exemplar approach to the status quo

§  A grass-root collaborative that works to facilitate collection, curation
   and sharing of experiments using a common, structured representation
   of the experiments that
    •  transcends individual biological and technological domains and
    •  can be ‘configured’ to implement (several of) the community
       standards
metadata tracking framework




                                                           user community



The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                               www.ebi.ac.uk/net-project
General-purpose, configurable format,
designed to support the use of several
standards checklists, terminologies and
conversions to (a growing number of) other
metadata formats, used by public
repositories, e.g.

  MAGE-Tab              Pride-xml



       SRA-xml             SOFT
ISA software suite: supporting standards-compliant experimental
annotation and enabling curation at the community level
(Rocca-Serra et al, 2010)
a collaborative effort of international research/service groups:
University of Oxford, EBI, Harvard School of Public Health, NERC Environmental
Bioinformatics Centre, Genomic Standards Consortium, US FDA Center for
Bioinformatics, Leibniz Institute of Plant Biochemistry and more….
Create template(s) to fit the type of
    experiments to be described	

    Create templates detailing the steps to be
    reported for different investigations, complying
    to community standards, e.g. configuring the
    value(s) allowed for each field to be
    •  text (with/without regular expression testing),
    •  ontology terms,
    •  numbers etc.




1
Describe, curate your experiment 	

     with geographically- distributed
     collaborators 	

     Report and edit the description of the
     investigation using customized Google Spreadsheets
     (importing the ‘template’ created by the ISA
     configurator) enabled with ontology search and
     term-tagging features.




2a
Or describe, curate your experiment
     using a desktop-based tool	

     Report and edit the description using this tool,
     (also customized using the templates) with a
     spreadsheet like look and feel, packed with
     functionalities such as 	

     • ontology search (access via             ) 	

     • term-tagging features	

     • import from spreadsheets etc…




2b
ISMB tag:
                                                                                                                         #PP44




                                                                                                        To mint DOIs




102   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                     www.ebi.ac.uk/net-project



                                  empowering researchers to use standards
Perform data analysis	

    	





    We are building relevant ISA modules for GenomeSpace, 	

    R-based BioConductor and Galaxy tools	





3
Share your experiments with the
    world as Linked Open Data	

    	





    Through conversion to RDF; work in
    collaboration with the W3C HCLSIG	





4
Share your experiments with the
                      world as Linked Open Data	

                      	





                      Through conversion to RDF; work in
                      collaboration with the W3C HCLSIG	





           4




Tim Berners-Lee’s 5-star deployment scheme for Linked Open Data
Submit your experiments to public repositories	

    	





    Directly in ISA-Tab or reformatting using the ISAconverter	





5
Create your own repository	

	





Store the investigations in the database, assign access rights and
conduct maintenance tasks.	

Share, browse, query and view investigations, their
descriptions and access associated data files.	





               6
Maguire E, Rocca-Serra P, Sansone SA, Davies J and Chen M.
Taxonomy-based Glyph Design -- with a Case Study on Visualizing
Workflows of Biological Experiments,
 IEEE Transactions on Visualization and Computer Graphics, volume 18, 2012
                                   (in press)
A growing ecosystem of over 30 public and internal resources
using the ISA metadata tracking framework (ISA-Tab and/or
format) to facilitate standards-compliant collection, curation,
management and reuse of investigations in an increasingly diverse
set of life science domains, including:
•  environmental health           •  stem cell discovery
•  environmental genomics         •  system biology
•  metabolomics                   •  transcriptomics
•  metagenomics                   •  toxicogenomics
•  nanotechnology                 •  also by communities working to build
•  proteomics,                       a library of cellular signatures
Implementations at Harvard




  Importance of a local community
Implementations at Harvard




data sharing
 in ISA-Tab




               Importance of a local community
Implementations at Harvard




data sharing
 in ISA-Tab




               Importance of a local community
Implementation at the EBI




113   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                     www.ebi.ac.uk/net-project
Data papers
Extensions of the



           Nanotechnology
      Informatics Working Group




115    The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                      www.ebi.ac.uk/net-project
Open source code




Community involvement and uptake!
1st ISA-Tab workshop! 3rd ISA-Tab workshop!      User workshops/visits - start!   1st public instance: !
       2nd ISA-Tab workshop!                              Other tools implement ! Harvard Stem Cell ! Growing number of
                                                          ISA-Tab!                Discovery Engine! systems starts to adopt
                                                                                                         ISA framework!


Core developments!
                                                                                  Conversions to !                Links to
                                                                                  Pride-XML/SRA-XML/!             analysis tools
Strawman ISA-Tab spec!                            ISA software v1!                MAGE-Tab and more!              starts!
                      Final ISA-Tab spec!            Database instance !
                                                     at EBI!                                      RDF format starts!

Publications!
                                                                                                       Stem Cell !
                                                                           ISA-Tab and !               Discovery ! ISA Commons!
                                               Omics data sharing!
            Workshop reports!                                              ISA software suite!         Engine!
                                              (Science)!                                                           (Nature Genetics)!
                                                                           (Bioinformatics)!           (NAR)!


2007    2008       2009                                              2010                        2011                    2012
Development timeline
Final remarks



        “The buzz around reproducible bioscience data:
       the policies, the communities and the standards”


“The reality from the buzz:
how to deliver reproducible bioscience data”
Your research and all (publicly
                                funded) research should make
                                      make an … impact




       http://www.flickr.com/photos/equinoxefr/2620239993/                                                       CC BY
118   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                     www.ebi.ac.uk/net-project
…..the biggest possible impact!




      http://www.flickr.com/photos/webhamster/2582189977/                                                        CC BY

119   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                     www.ebi.ac.uk/net-project
http://www.flickr.com/
photos/andrevanbortel/
3745527869/sizes/m/in/
photostream/
We must increase the level of annotation




   Notes in Lab Books       Spreadsheets and Tables   Facts as RDF statements
   (information for humans) ( the compromise)         (information for machines)

•  Invest in curating and manage data at the source using:
    •  a common metadata tracking framework, such as ISA
    •  publicly available and community-developed terminologies
    •  recording sufficient contextual information of the experimental steps
§  Progressively datasets will become more comprehensible, interoperable,
    reproducible and (re)usable, underpinning future investigations
122   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                     www.ebi.ac.uk/net-project

Weitere ähnliche Inhalte

Was ist angesagt?

ADARSH JOSE_Resume
ADARSH JOSE_ResumeADARSH JOSE_Resume
ADARSH JOSE_Resume
Adarsh Jose
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...
Sapan Anand
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
Susanna-Assunta Sansone
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
nadeem akhter
 

Was ist angesagt? (20)

Student Project - Accurate Computer-assisted Cell Culture using Ultrasounds
Student Project - Accurate Computer-assisted Cell Culture using UltrasoundsStudent Project - Accurate Computer-assisted Cell Culture using Ultrasounds
Student Project - Accurate Computer-assisted Cell Culture using Ultrasounds
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1
 
Basics in bioinformatics
Basics in bioinformaticsBasics in bioinformatics
Basics in bioinformatics
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
ADARSH JOSE_Resume
ADARSH JOSE_ResumeADARSH JOSE_Resume
ADARSH JOSE_Resume
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Sabina Leonelli
Sabina LeonelliSabina Leonelli
Sabina Leonelli
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
Introduction to Bioinformatics Slides
Introduction to Bioinformatics SlidesIntroduction to Bioinformatics Slides
Introduction to Bioinformatics Slides
 
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Bioinformatics workshop presentation
Bioinformatics   workshop presentationBioinformatics   workshop presentation
Bioinformatics workshop presentation
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Bio Informatics
Bio InformaticsBio Informatics
Bio Informatics
 

Andere mochten auch

Bienvenidaaladerecha Eug
Bienvenidaaladerecha EugBienvenidaaladerecha Eug
Bienvenidaaladerecha Eug
guestf9c1fe
 
SharePoint Worst Practices - SPSRIC
SharePoint Worst Practices - SPSRICSharePoint Worst Practices - SPSRIC
SharePoint Worst Practices - SPSRIC
Dan Usher
 
Composição da carteira ifix novembro 2013
Composição da carteira ifix novembro 2013Composição da carteira ifix novembro 2013
Composição da carteira ifix novembro 2013
claudiusinhos
 
你所不知道的健康檢查
你所不知道的健康檢查你所不知道的健康檢查
你所不知道的健康檢查
honan4108
 

Andere mochten auch (20)

Redacción de textos Nicolas Arturo Vargas
Redacción de textos Nicolas Arturo VargasRedacción de textos Nicolas Arturo Vargas
Redacción de textos Nicolas Arturo Vargas
 
Azure en entornos empresariales
Azure en entornos empresarialesAzure en entornos empresariales
Azure en entornos empresariales
 
98 2016 da 0 a tre anni
98   2016   da 0 a tre anni98   2016   da 0 a tre anni
98 2016 da 0 a tre anni
 
Developing Chemoinformatics Models
Developing Chemoinformatics ModelsDeveloping Chemoinformatics Models
Developing Chemoinformatics Models
 
Future of ecommerce
Future of ecommerceFuture of ecommerce
Future of ecommerce
 
Bienvenidaaladerecha Eug
Bienvenidaaladerecha EugBienvenidaaladerecha Eug
Bienvenidaaladerecha Eug
 
SharePoint Worst Practices - SPSRIC
SharePoint Worst Practices - SPSRICSharePoint Worst Practices - SPSRIC
SharePoint Worst Practices - SPSRIC
 
Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"
Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"
Open Cloud Innovation Festa 2016 | モノがクラウドに直結 IoT向け プログラマブルな通信プラットフォーム "SORACOM"
 
Reto clínico joven con dolor abdominal intratable
Reto clínico joven con dolor abdominal intratableReto clínico joven con dolor abdominal intratable
Reto clínico joven con dolor abdominal intratable
 
Propos Février 10
Propos Février 10Propos Février 10
Propos Février 10
 
ICCS9 2011 Talk
ICCS9 2011 TalkICCS9 2011 Talk
ICCS9 2011 Talk
 
Faith And Economics
Faith And EconomicsFaith And Economics
Faith And Economics
 
Composição da carteira ifix novembro 2013
Composição da carteira ifix novembro 2013Composição da carteira ifix novembro 2013
Composição da carteira ifix novembro 2013
 
你所不知道的健康檢查
你所不知道的健康檢查你所不知道的健康檢查
你所不知道的健康檢查
 
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...положение о государственной итоговой аттестации выпускников 9 классов мбоу со...
положение о государственной итоговой аттестации выпускников 9 классов мбоу со...
 
HXRefactored - Doesn't Your Mom Deserve Better
HXRefactored - Doesn't Your Mom Deserve BetterHXRefactored - Doesn't Your Mom Deserve Better
HXRefactored - Doesn't Your Mom Deserve Better
 
Evaluation
EvaluationEvaluation
Evaluation
 
collaborative inquiry
collaborative inquiry collaborative inquiry
collaborative inquiry
 
E book La Crisis Silenciosa (1ª Parte)
E book La Crisis Silenciosa (1ª Parte)E book La Crisis Silenciosa (1ª Parte)
E book La Crisis Silenciosa (1ª Parte)
 
2015 Ultimate Hiring Toolbox For Small & Medium Businesses
2015 Ultimate Hiring Toolbox For Small & Medium Businesses2015 Ultimate Hiring Toolbox For Small & Medium Businesses
2015 Ultimate Hiring Toolbox For Small & Medium Businesses
 

Ähnlich wie eScience-School-Oct2012-Campinas-Brazil

Ähnlich wie eScience-School-Oct2012-Campinas-Brazil (20)

ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-Research
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Sansone bio sharing introduction
Sansone bio sharing introductionSansone bio sharing introduction
Sansone bio sharing introduction
 
Genome data management
Genome data managementGenome data management
Genome data management
 
Anatomy age.pptx
Anatomy age.pptxAnatomy age.pptx
Anatomy age.pptx
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Thesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research dataThesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research data
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformatics
 

Mehr von Susanna-Assunta Sansone

Mehr von Susanna-Assunta Sansone (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
FAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdfFAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdf
 
FAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdfFAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdf
 
FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIR
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
FAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-SingaporeFAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-Singapore
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipes
 
FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook
 
FAIRsharing for EOSC
FAIRsharing for EOSC FAIRsharing for EOSC
FAIRsharing for EOSC
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
 
FAIRsharing: what we do for policies
FAIRsharing: what we do for policiesFAIRsharing: what we do for policies
FAIRsharing: what we do for policies
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 response
 
FAIRsharing poster
FAIRsharing posterFAIRsharing poster
FAIRsharing poster
 
The FAIR Cookbook poster
The FAIR Cookbook posterThe FAIR Cookbook poster
The FAIR Cookbook poster
 

eScience-School-Oct2012-Campinas-Brazil

  • 1. The buzz around reproducible bioscience data: the policies, the communities and the standards Susanna-Assunta Sansone, PhD Principal Investigator and Team Leader, University of Oxford e-Research Centre, Oxford, UK Slides at: http://www.slideshare.net/SusannaSansone SPSAS e-SciBioEnergy Sao Paolo School of Advanced Science on e-Science for Bioenergy Research, 22-26 Oct, 2012, Campinas, Brazil
  • 2. Lab scientist! Data scientist! Team Leader! Consultant!
  • 5. Oxford e-Research Centre Providing research computing, high- performance computing Integrating with national and international infrastructure Supporting leading edge facilities through education and training
  • 6. Oxford e-Research Centre Collaborating with European and wider international groups in, e.g.: •  energy, •  radio astronomy, •  biological data federation, •  life sciences simulation, •  biodiversity, •  computational chemistry, •  neuroscience, •  digital humanities tools, •  digital music analysis Research in •  computation, •  data infrastructure and analysis, •  visualisation
  • 7. My team’s activities and groups we work with data management and biocuration, collaborative development of software and database, standards and ontology •  environmental genomics •  stem cell discovery •  metabolomics •  system biology •  metagenomics •  transcriptomics •  nanotechnology •  toxicogenomics •  proteomics •  environmental health env   agro   tox/pharma   health  
  • 9. Outline “The buzz around reproducible bioscience data: the policies, the communities and the standards” “The reality from the buzz: how to deliver reproducible bioscience data”
  • 10. Preserve institutional / corporate memory Harmonize collection across sites Find matching studies Data dissemination Long-term data stewardship 10
  • 11. Utilize public data Identify suitable data Retrieve Curate and harmonize Re-analyze 11
  • 12. Address reproducibility / reuse of public data 12
  • 13. Address reproducibility / reuse of public data 13
  • 14. Address reproducibility / reuse of public data Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 14 149-55 (2009) doi:10.1038/ng.295
  • 15. Address reproducibility / reuse of public data 15 15
  • 16. Address reproducibility / reuse of public data 16 16
  • 17. Address reproducibility / reuse of public data 17 17
  • 19. COMPREHENSIBLE INTEROPERABLE REPRODUCIBLE REUSABLE http://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  • 20. Growing, worldwide movement for reproducible research Shared, annotated research data and methods offer new discovery opportunities and prevent unnecessary repetition of work. Improved data sharing underpins science of the future “Publicly-funded research data are a public good, produced in the public interest” “Publicly-funded research data should be openly available 20 to the maximum extent possible” The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 21. Growing, worldwide movement for reproducible research esoteric formats comprehensible? lack of sufficient interoperable? contextual information reusable? hoc or proprietary terminology reproducible? §  Researchers and bioinformaticians in both academic and commercial science, along with funding agencies and publishers, embrace the concept that community-developed standards are pivotal to structure and enrich the annotation of •  entities of interest (e.g., genes, metabolites, phenotypes) and •  experimental steps (e.g., provenance of study materials, technology and measurement types)
  • 22. Structure and enrich description of the experiments §  Describe and communicate the information in an unambiguous, human and machine readable manner Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, RNA prepared…etc. Age value Unit Strain name Subject of the experiment Type of diet and Type of protocol - sample treatment experimental condition Type of protocol - nucleic acid extraction Anatomy part
  • 23. Structure and enrich description of the experiments §  Describe and communicate the information in an unambiguous, human and machine readable manner Figure: credit to OBI consortium
  • 24. Reproducible & Reusable Bioscience Research
  • 25. reasoning visualization analysis browsing integration exchange retrieval Well-annotated & Structured Data Reproducible & Reusable Bioscience Research
  • 26. reasoning visualization analysis browsing integration exchange retrieval Community Software Standards Tools Well-annotated & Structured Data Reproducible & Reusable Bioscience Research
  • 28. Today’s bioscience research Publications Experimental and computational data §  Is interdisciplinary and integrative in character •  need to deal with new and existing datasets •  deal with a variety of data types §  ‘How the organism works’ is the focus •  Twenty years ago data was the center Source of the figure: EBI website
  • 29. 29 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone Source: http://ebbailey.wordpress.com www.ebi.ac.uk/net-project
  • 30. Example from the toxicogenomics domain Study looking at the effect of a compound inducing liver damage by characterizing/measuring - the metabolic profile by MS and NMR - protein expression in liver by MS - gene expression by DNA microarray -  conducting genetic and phenotypical analysis Information contributing to the construction and validation of system biology models
  • 31. Example of experiments by InnoMed PredTox 31 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 a FP6 public-private consortium Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 32. Structured description of datasets §  Capture all salient features of the experimental workflow §  Make annotation explicit and discoverable §  Structure the descriptions for consistency, tracking §  independent variables §  dependent variables using §  cross reference and resolvable identifiers
  • 33. Not too much, not too little, just ‘right’ §  We must strike a balance between •  depth and breadth of information; and •  sufficient information required to reuse the data
  • 35. Information intensive experiments To make the experiments comprehensible and reusable, underpinning future investigations, we need common ways to report and share the experimental details and the associated data. Consistent reporting will have a positive and long-lasting impact on the value of collective scientific outputs.
  • 36. Common ways to report and share § The challenges we face •  Large in volume: lots of data types and metadata! •  Lots of free text descriptions: hard to mine, subject to mistakes! •  Babel of terminologies: lack of definitions, hard to map! •  Heterogeneous file formats: software lock-in! § Need for reporting standards •  Minimal reporting descriptors - Report the same ‘core essentials’ •  Controlled vocabularies or ontology - Use the same word and mean the same thing •  Common exchange formats - Make tools interoperable, allow data exchange and integration
  • 37. Reporting standards – the benefits §  Describe and communicate the information to others, in an unambiguous manner §  To unlock the value in the data •  Compare, query and evaluate data - Facilitate scientific validation of the findings •  Understand variability within/between different technologies and protocols -  Facilitate technical validation -  Enable optimization of the experimental designs -  Identify critical checkpoints and develop quality metrics §  To define submission and/or publication requirements •  Journals •  Databases §  To ensure data integrity, reproducibility and (re)use
  • 38. Escalating number of standardization efforts in bioscience, e.g.: Genomics Standards Genome annotation Consortium (GSC) www.geneontology.org gensc.org Functional Enzymology data Genomics Data standards Society (FGED) www.strenda.org www.fged.org HUPO- Proteomics Standards Initiative (PSI) Systems modelling http://www.psidev.info standards www.co.mbine.org Cheminformatics www.ebi.ac.uk/chebi Pathways www.biopax.org Metabolomics Standards Initiative (MSI) http://www.metabolomicssociety.org
  • 39. Different community, different norms and standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information Challenges: lack of coordination, fragmentation and uneven coverage
  • 40. Is this ‘general mobilization’ good or bad? use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information §  Difference in structures and processes: •  organization types (open, close to members, society, WG…) •  standards development (how to design, develop, evaluate, maintain…) •  adoption, uptake, outreach (link to journals, funders, commercial sector…) •  funds (sponsors, memberships, grants, volunteering…)
  • 41. Is this ‘general mobilization’ good or bad? use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information §  Fragmentation of the standards is a major issue •  Being focused on particular communities’ interests, be their individual technologies or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards •  This severely hinders the interoperability of databases and tools and ultimately the integration of datasets
  • 42. Fragmentation of the databases and data, e.g. Access Storage Submission Three EBI omics systems
  • 43. Fragmentation of the databases and data, e.g. Access Storage Submission Three EBI omics systems
  • 44. Fragmentation of the databases and data, e.g. Access Storage Submission Three EBI omics systems
  • 45. Fragmentation of the databases and data, e.g. Access DIFFERENT Download formats DIFFERENT - Core requirements Storage represented - Representation of the studies and related samples - Curation practices DIFFERENT Submission Formats, terminologies and tools Three EBI omics systems
  • 46. To integrate data we need interoperable standards epidemiology plant biology microbiology Biologically-delineated views of the world Generic features ( common core ) - description of source biomaterial - experimental design components MS MS Arrays Gels NMR Technologically-delineated Columns FTIR views of the world Scanning Arrays & Scanning Columns transcriptomics metabolomics transcriptomics
  • 47. Need to address the fragmentation §  Promote synergies •  Among basic academic (omics) research but also regulatory- or healthcare-driven initiatives §  Much could be learned from exchange of ideas and practices •  Although, regulatory- or healthcare-driven initiatives have far stricter guidelines •  Although, often SDOs have ‘close’ discussions, require membership §  Create interoperable standards •  Fit neatly into a jigsaw, resolving inconsistency and filling gaps §  Overcome several barriers •  Technical •  Funding issue •  Sociological......
  • 48. Eloquent quotes “Biologists would rather share their toothbrush than their gene name” Michael Ashburner, Professor Genetics, University of Cambridge, UK “Any customer can have a car painted any colour that he wants so long as it is black” Henry Ford, you know who he is… 48 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 49. Standards – an old issue, e.g. engineering in 1850 §  Buying nuts and bolts is easy today •  But in the 19th century it was very complicated!
  • 50. Standards – an old issue, e.g. engineering in 1850 §  Buying nuts and bolts is easy today •  But in the 19th century it was very complicated! §  Nuts and bolts were custom made •  Products from different shops were incompatible •  Craftsmen liked the monopoly - Customers were ‘locked in’ !! §  In 1864 William Sellers initiated the standardization •  Mass production •  Get interchangeable parts •  Standardized way to make nuts and bolts §  Generally adopted only after WWII, though …. !!
  • 51. Social engeneering 51 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 52. Ownership of open standards can be problematic in broad, grass-root collaborations; it requires improved models, to encourage maintenance of and contributions to these efforts, supporting their evolutions 52 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 53. The extensive community liaison needs to be managed and funded; rewards and incentives need to be identified for all contributors 53 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 54. The cost of implementing a standards-supported data sharing vision is as large as the number of stakeholders that must operate synchronously 54 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 55. 1. Funders actively developing data policies §  Several data preservation, management and sharing policies have emerged in response to increased funding for omics domains §  Even if in general terms, standards are recognized as necessary ‘tools’ to unambiguously represent, describe and communicate research data
  • 56.
  • 57. 2. Similar trend in the regulatory arena §  “… lack of standardized data affects CDER’s review processes by curtailing a reviewer’s ability to perform integral tasks such as rapid acquisition, storage, analysis......efficient management of a portfolio of standards projects will require coordinated efforts and clear roles for multiple participants within/outside FDA”
  • 58.
  • 59. 3. Publishes have become strong advocators §  Continue to support the development of open standards and tools •  to support sharing of sufficiently well annotated datasets 59 •  to enable comprehensible, reusable, www.ebi.ac.uk/net-project research reproducible The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 60. ….the rise of data-driven journals, e.g.: partnering with:
  • 61. The rise of data-driven journals, e.g.: partnering with:
  • 62. 4. Similar trend in the commercial sector §  R&D has invested heavily in procedures and tools that integrate external information with their own data to enhance the decision-making process •  Now joining forces to streamline non-competitive elements of the life science workflow by the specification of common standards, business terms, relationships and processes
  • 63. ....their information landscape is evolving Yesterday Today Tomorrow Proprietary Public content content provider provider Big Life Science Big Life CRO Academic Company Science group Company Regulatory authorities Service provider Software vendor Yesterday Today Tomorrow Innovation Innovation inside Searching for Innovation Heterogeneity of collaborations; part of the wider ecosystem Model IT Internal apps & data Struggling with change Cloud, services security and trust Data Mostly inside In and out Distributed Portfolio Internally driven and owned Partially shared Shared portfolio Credit to: Pistoia Alliance
  • 65.
  • 66.
  • 67. Take home messages “The buzz around reproducible bioscience data: the policies, the communities and the standards” u  Contribute to the reproducible research movement u  Learn about open community-standards in your area u  Consider data science as a career path
  • 68. Outline “The buzz around reproducible bioscience data: the policies, the communities and the standards” “The reality from the buzz: how to deliver reproducible bioscience data”
  • 69. How do we achieve this? Is it possible to achieve a common, structured representation of diverse bioscience experiments that: •  “The buzz around reproducible bioscience data: follows the appropriate community standards and COMPREHENSIBLE •  the policies, E R Ocommunities research?standards” delivers I N T the P E R A B L E and the REPRODUCIBLE REUSABLE “The reality from the buzz: how to deliver reproducible bioscience data”
  • 70. Growing number of reporting standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 71. Growing number of reporting standards + 303 + 150 + 130 Source: MIBBI, Source: BioPortal EQUATOR Estimated Databases, annotation, curation tools MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 72. But how much do we know about these standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 73. But how much do we know about these standards Which tools and I use high throughput databases sequencing technologies, implement which which one are applicable standards? to me? How can I get What are the involved to criteria to evaluate propose their status and extensions or value? modifications? Which one are I work on plants, mature enough for are these just for me to use or biomedical recommend? applications?
  • 74. But how much do we know about these standards §  A bewildering array of standards is available, but •  these are hard to find, at different levels of maturity; in some areas duplications or gaps in coverage also exist §  Standards are just a ‘means to an end’, therefore •  we want to make them discoverable and accessible, maximizing their use to assist the virtuous data cycle, from generation to standardization through publication to subsequent sharing and reuse
  • 75. (2007) Vol 25 No 11 obofoundry.org
  • 76. Towards Lego-like ontologies §  Compound terms should be formed out of simpler constituents: •  Body weight weight (quality ontology, PATO) that inheres_in (relation ontology, RO) whole_organism (anatomy ontology, CARO) •  Xylene contaminated soil soil (environmental ontology, EnvO) that has_contaminated (relation ontology, RO) xylene (chemical ontology, ChEBI) 76 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 77. (2008) Vol 26 No 8 mibbi.og
  • 78. § Serves researchers, biocurators, journal editors and reviewers, and funders to §  discover checklists for a particular domain §  monitor progress of extant efforts §  facilitate collaborations
  • 79. Science (2009),Vol 326, 234-236 http://biosharing.org
  • 80. A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation) Field*, Sansone* et al., Omics data sharing. Science 80 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone 326, 234-36 (2009) doi:0.1126/science.1180598 www.ebi.ac.uk/net-project
  • 81. •  A coherent, curated and searchable catalogue of data sharing resources •  Bioscience standards and associated data-sharing policies, publications, tools and databases •  Assessment criteria for usability and popularity of standards •  Relationships among standards •  Encouragement for communication & interaction among groups •  Promoting interoperability & informed decisions about standards
  • 82. Smith et al, 2007 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 83. Smith et al, 2007 Taylor, Field, Sansone et al, 2008 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 84. List of databases, linked to standards a collaboration with Database Issue 84 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 85. List of databases, linked to standards a collaboration with Database Issue 85 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 86. List of databases, linked to standards a collaboration with Database Issue 86 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 87. Major challenge: define ‘relations’ among standards CREDIT: The relationship among popular standard formats for pathway information Demir, et al., The BioPAX BioPAX and PSI-MI are designed for data exchange to and from databases and community standard for pathway and network data integration. SBML and CellML are designed to pathway data sharing, support mathematical simulations of biological systems and SBGN represents 2010. pathway diagrams. 87 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 88.
  • 89. Example of multi-assays study – how many ‘standards’ are applicable to this?
  • 90. Example of multi-assays study – how many ‘standards’ are applicable to this?
  • 91. Example of multi-assays study – how many ‘standards’ are applicable to this?
  • 92. Example of multi-assays study – how many ‘standards’ are applicable to this?
  • 93. An exemplar approach to the status quo §  A grass-root collaborative that works to facilitate collection, curation and sharing of experiments using a common, structured representation of the experiments that •  transcends individual biological and technological domains and •  can be ‘configured’ to implement (several of) the community standards TOWARDS INTEROPERABLE BIOSCIENCE DATA doi:10.1038/ng.1054 Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Feb 2012 Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, www.biosharing.org www.isacommons.org Wolstencroft K, Xenarios J, Hide W. www.isacommons.org
  • 94. An exemplar approach to the status quo §  A grass-root collaborative that works to facilitate collection, curation and sharing of experiments using a common, structured representation of the experiments that •  transcends individual biological and technological domains and •  can be ‘configured’ to implement (several of) the community standards
  • 95. metadata tracking framework user community The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 96. General-purpose, configurable format, designed to support the use of several standards checklists, terminologies and conversions to (a growing number of) other metadata formats, used by public repositories, e.g. MAGE-Tab Pride-xml SRA-xml SOFT
  • 97. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level (Rocca-Serra et al, 2010) a collaborative effort of international research/service groups: University of Oxford, EBI, Harvard School of Public Health, NERC Environmental Bioinformatics Centre, Genomic Standards Consortium, US FDA Center for Bioinformatics, Leibniz Institute of Plant Biochemistry and more….
  • 98.
  • 99. Create template(s) to fit the type of experiments to be described Create templates detailing the steps to be reported for different investigations, complying to community standards, e.g. configuring the value(s) allowed for each field to be •  text (with/without regular expression testing), •  ontology terms, •  numbers etc. 1
  • 100. Describe, curate your experiment with geographically- distributed collaborators Report and edit the description of the investigation using customized Google Spreadsheets (importing the ‘template’ created by the ISA configurator) enabled with ontology search and term-tagging features. 2a
  • 101. Or describe, curate your experiment using a desktop-based tool Report and edit the description using this tool, (also customized using the templates) with a spreadsheet like look and feel, packed with functionalities such as • ontology search (access via ) • term-tagging features • import from spreadsheets etc… 2b
  • 102. ISMB tag: #PP44 To mint DOIs 102 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project empowering researchers to use standards
  • 103. Perform data analysis We are building relevant ISA modules for GenomeSpace, R-based BioConductor and Galaxy tools 3
  • 104. Share your experiments with the world as Linked Open Data Through conversion to RDF; work in collaboration with the W3C HCLSIG 4
  • 105. Share your experiments with the world as Linked Open Data Through conversion to RDF; work in collaboration with the W3C HCLSIG 4 Tim Berners-Lee’s 5-star deployment scheme for Linked Open Data
  • 106. Submit your experiments to public repositories Directly in ISA-Tab or reformatting using the ISAconverter 5
  • 107. Create your own repository Store the investigations in the database, assign access rights and conduct maintenance tasks. Share, browse, query and view investigations, their descriptions and access associated data files. 6
  • 108. Maguire E, Rocca-Serra P, Sansone SA, Davies J and Chen M. Taxonomy-based Glyph Design -- with a Case Study on Visualizing Workflows of Biological Experiments, IEEE Transactions on Visualization and Computer Graphics, volume 18, 2012 (in press)
  • 109. A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework (ISA-Tab and/or format) to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: •  environmental health •  stem cell discovery •  environmental genomics •  system biology •  metabolomics •  transcriptomics •  metagenomics •  toxicogenomics •  nanotechnology •  also by communities working to build •  proteomics, a library of cellular signatures
  • 110. Implementations at Harvard Importance of a local community
  • 111. Implementations at Harvard data sharing in ISA-Tab Importance of a local community
  • 112. Implementations at Harvard data sharing in ISA-Tab Importance of a local community
  • 113. Implementation at the EBI 113 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 115. Extensions of the Nanotechnology Informatics Working Group 115 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 116. Open source code Community involvement and uptake! 1st ISA-Tab workshop! 3rd ISA-Tab workshop! User workshops/visits - start! 1st public instance: ! 2nd ISA-Tab workshop! Other tools implement ! Harvard Stem Cell ! Growing number of ISA-Tab! Discovery Engine! systems starts to adopt ISA framework! Core developments! Conversions to ! Links to Pride-XML/SRA-XML/! analysis tools Strawman ISA-Tab spec! ISA software v1! MAGE-Tab and more! starts! Final ISA-Tab spec! Database instance ! at EBI! RDF format starts! Publications! Stem Cell ! ISA-Tab and ! Discovery ! ISA Commons! Omics data sharing! Workshop reports! ISA software suite! Engine! (Science)! (Nature Genetics)! (Bioinformatics)! (NAR)! 2007 2008 2009 2010 2011 2012 Development timeline
  • 117. Final remarks “The buzz around reproducible bioscience data: the policies, the communities and the standards” “The reality from the buzz: how to deliver reproducible bioscience data”
  • 118. Your research and all (publicly funded) research should make make an … impact http://www.flickr.com/photos/equinoxefr/2620239993/ CC BY 118 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 119. …..the biggest possible impact! http://www.flickr.com/photos/webhamster/2582189977/ CC BY 119 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 121. We must increase the level of annotation Notes in Lab Books Spreadsheets and Tables Facts as RDF statements (information for humans) ( the compromise) (information for machines) •  Invest in curating and manage data at the source using: •  a common metadata tracking framework, such as ISA •  publicly available and community-developed terminologies •  recording sufficient contextual information of the experimental steps §  Progressively datasets will become more comprehensible, interoperable, reproducible and (re)usable, underpinning future investigations
  • 122. 122 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project