AWS Community Day CPH - Three problems of Terraform
ISA - a short overview - Dec 2013
1. Overview of the ISA format and software suite
Help researchers to
curate, store, analyse, share and publish their experiments
Susanna-Assunta Sansone, PhD (associate director, PI)
Philippe Rocca-Serra, PhD (technical coordinator)
Alejandra Gonzalez-Beltran, PhD (senior developer)
Eamonn Maguire, DPhil candidate (senior developer)
Pavlos Georgiou, MSc candidate (developer)
and new team member to be recruited
2. Focus on the experimental context and compliance to standards
user community
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
3. Rationale for developing ISA
Researchers and bioinformaticians in both
academic and commercial arenas, along with
funding agencies and publishers, embrace
the concept that community-developed
standards are pivotal to structure and enrich
the annotation of
• entities of interest (e.g., genes,
metabolites, phenotypes) and
• experimental steps (e.g., provenance of
study materials, technology and
measurement types)
4. Rationale for developing ISA
Capture all salient features of
the experimental workflow
Make annotation explicit and
discoverable
Support data provenance
tracking
Use community standards
6. A wealth of community, different norms and standards, e.g.:
allow data to flow from
one system to another
use the same word and
refer to the same ‘thing’
report the same core,
essential information
To track provenance of the information and ensure richness of data and experimental
metadata descriptions, to maximize sharing and reusability
Key challenges:
lack of coordination, fragmentation and uneven coverage
7. To compare and integrate data we need interoperable standards
epidemiology
plant biology
microbiology
Biologically-delineated
views of the world
Generic features (‘common core’)
- description of source biomaterial
- experimental design components
MS
Arrays
Gels
Columns
Scanning
transcriptomics
Arrays &
Scanning
proteomics
MS
Technologically-delineated
views of the world
NMR
FTIR
Columns
metabolomics
8. Mapping the landscape of standards, work in progress
See more at:
+ 303
Estimated
Source: MIBBI,
EQUATOR
Source: BioPortal
+ 130
+ 150
Databases,
annotation,
curation
tools
MAGE-Tab
GCDML
AAO
CHEBI
SRAxml
CML
SOFT
DICOM FASTA
GELML
MITAB
….
SEDML…
OBI
VO
PATO
ENVO
XAO
MzML
DO
….
MIAPA
MIRIAM
MIQAS
MIX
REMARK
MIGEN
MOD
SBRML
….
MIAME
TEDDY
PRO
BTO
IDO…
….
MIQE
MIAPE
CIMR
MIASE
….
CONSORT
MISFISHIE….
….
9.
10. Dealing with fragmented standards for the experimental context
user community
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
11. General-purpose, configurable format, designed to
support:
- several omics standards checklists, terminologies
- reference to CDISC SDTM file(s), and
- conversions to (a growing number of) other metadata
formats, used by public repositories
13. 1
Create template(s) to fit the type of
experiments to be described
Create templates detailing the steps to be
reported for different investigations, complying
to community standards, e.g. configuring the
value(s) allowed for each field to be
• text (with/without regular expression testing),
• ontology terms,
• numbers etc.
We now have configurations for submission
to EBI repositories, complying to several
community standards.
14. 1
Or describe, curate your experiment using a
desktop-based tool
Report and edit the description using this tool,
(also customized using the templates) with a
spreadsheet like look and feel, packed with
functionalities such as
• ontology search (access via
)
• term-tagging features
• import from spreadsheets etc…
15.
16. 1
Describe, curate your experiment with
geographically- distributed collaborators
Report and edit the description of the
investigation using customized Google
Spreadsheets (importing the ‘template’ created
by the ISA configurator) enabled with ontology
search and term-tagging features.
23. •
•
•
•
•
New open-access, online-only publication for descriptions of scientifically valuable datasets
Only content type: Data Descriptor, narrative + structured parts
Initially focused on the life, environmental and biomedical sciences
Data Descriptor will be complementary to traditional research journals and data repositories
Designed to foster data sharing and reuse, and ultimately to accelerate scientific discovery
www.nature.com/scientificdata
24.
A grass-root collaborative that works to facilitate collection, curation and
sharing of experiments using a common, structured representation of the
experiments that
•
transcends individual biological and technological domains and
•
can be ‘configured’ to implement (several of) the community standards
25.
A grass-root collaborative that works to facilitate collection, curation and
sharing of experiments using a common, structured representation of the
experiments that
•
transcends individual biological and technological domains and
•
can be ‘configured’ to implement (several of) the community standards
environmental health
genomics
metabolomics
metagenomics
nanotechnology
proteomics
stem cell discovery
system biology
transcriptomics
toxicogenomics
26. Community involvement and uptake
1st ISA-Tab
workshop
2nd ISA-Tab
workshop
3rd ISA-Tab
workshop
User workshops/visits - start
Other tools
implement ISA-Tab
Core developments
Straw man
ISA-Tab spec
ISA software v1
Final ISA-Tab spec
2007
2008
1st public instance:
Harvard Stem Cell Growing number of
Discovery Engine systems starts to adopt
ISA framework
Conversions to
Pride-XML/SRA-XML/
MAGE-Tab
Database instance
at EBI
2009
2010
Links to
analysis tools
starts
RDF/OWL format starts
2011
2012
Publications
The ISA software suite:
supporting standards-
ISA chapter in : Open Source
compliant curation at the
Software in Life Science
community level
Research
OntoMaton: a Bioportal
powered ontology widget for
Google Spreadsheets.
Bioinformatics
Woodhead Publishing
Bioinformatics
2013