Engler and Prantl system of classification in plant taxonomy
Experiences to learn from the MS proteomics field
1. Experiences to learn from the mass
spectrometry proteomics field
Dr. Juan Antonio Vizcaíno
Proteomics Team Leader
EMBL-EBI
Hinxton, Cambridge, UK
2. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
•Develops data format standards for proteomics.
•Both data representation and annotation standards.
•Involves data producers, database providers, software producers,
publishers, …
•Active Workgroups: MI, MS, PI and now a new QC group.
•Inter-group activities: MIAPE and Controlled Vocabularies.
•Started in 2002, so some experience already…
•One annual meeting in March-April, regular phone calls.
•Peer Review for standards: PSI document process.
http://www.psidev.info
HUPO Proteomics Standards Initiative
3. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
Current PSI Proteomics Standard File Formats for
Mass Spectrometry
• mzMLMS data
• mzIdentMLIdentification
• mzQuantMLQuantitation
• mzTabFinal Results
• TraMLSRM
4. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
• mzML is actively used already to store MS data
(very flexible format).
• mzTab is a tab-delimited format that it is being
extended to support MS metabolomics data in a
better way. It can be used for both identification
and quantification results.
• mzQuantML and TraML could be used with small
molecule data, but it has not been tested.
Reuse of data standards in metabolomics
5. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
• mzML is actively used already to store MS data
(very flexible format).
• mzTab is a tab-delimited format that it is being
extended to support MS metabolomics data in a
better way. It can be used for both identification
and quantification results.
• Meeting next week in Liverpool organised by A. Jones.
• mzQuantML and TraML could be used with small
molecule data, but it has not been tested.
Reuse of data standards in metabolomics
6. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
Current Standard File Formats that are or could be
used in metabolomics
• mzMLMS data
• mzIdentMLIdentification
• mzQuantML *Quantitation
• mzTabFinal Results
• TraML *SRM
7. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
Current vision for data exchange standards in MS
Neumann (IPB-Halle), Proteomics and HUPO-PSI community
8. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
imzML: data standard for mass imaging data
http://www.imzml.org
Not a PSI format: Based on mzML
9. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
qcML files to be generated after submission
• XML format that captures output from QC pipelines
10. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
• Don’t reinvent the wheel! There is no need…
• Software libraries (APIs) to handle the standards.
• Data converters.
• Data visualisation tools.
• Data analysis tools and workflows.
• A big proportion of the available software is open
source.
Opportunity to reuse and extend existing software
11. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
mzML: more software available
The most popular search
engines support mzML
Many parser libraries available
Conversion from raw files
into mzMLhttp://www.psidev.info/mzml_1_0_0
12. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
Data visualisation: PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., MCP, 2016
PRIDE Inspector Toolsuite
PRIDE Inspector Toolsuite supports:
- PRIDE XML
- mzIdentML
- mzML & all types of spectra files
- mzTab identification and Quantification
https://github.com/PRIDE-Toolsuite/
13. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
OpenMS/TOPP
• OpenMS – an open-source C++ framework for computational mass
spectrometry
• Jointly developed at ETH Zürich, FU Berlin, University of Tübingen
• Open source: BSD 3-clause license
• Portable: available on Windows, OSX, and Linux
• TOPP – The OpenMS Proteomics Pipeline
• Building blocks: one application for each analysis step
• All applications share identical user interfaces
• Uses PSI standard formats and integrates seamlessly with other applications supporting
these formats
• Can be integrated in various workflow systems
• TOPPAS – TOPP Pipeline Assistant
• Galaxy
• WS-PGRADE
• KNIME
Kohlbacher et al., Bioinformatics (2007), 23:e191
14. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
ProteomeXchange Consortium
• Goal: Development of a framework to allow
standard data submission and dissemination
pipelines between the main existing proteomics
repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK), MassIVE (UCSD, San Diego) and
jPOST (Japan) will be integrated in July 2016.
• EU FP7 CA (01/2011-> 06/2014).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014
15. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
PRIDE Archive submitted datasets up until 1st April, 2016
• In the last complete year: on average, >150 submitted datasets per
month
• Size of PRIDE Archive: ~ 220TB
16. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
Vendor support for mzIdentML has grown in
parallel with the number of submitted datasets
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- Mascot
- MSGF+
- Myrimatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker (several open source tools)
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem (from PILEDRIVER version)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
- Crux
An increasing number of tools support export to mzIdentML
1.1
Updated list: http://www.psidev.info/tools-implementing-
17. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
•Develop tools in parallel with the data standards.
•Don’t reinvent the wheel! Many ideas and software already
there.
•Ideally, get vendors involved as soon as possible.
•Data repositories and data standards are a perfect match.
Conclusions
18. Juan A. Vizcaíno
juan@ebi.ac.uk
12th Conference of the Metabolomics Society
Dublin, 27 June 2016
Acknowlegements and further reading…
http://www.psidev.info
Poster P18