2. Proteomics Bioinformatics
EMBL-EBI, December 2016
Outline
• Introduction to OpenMS
Modularity & Workflows
Visualization.
Integration with other tools.
• Two example workflows
Protein identification
Label-free quantification
5. Proteomics Bioinformatics
EMBL-EBI, December 2016
Modularity
tools for identification
DecoyDatabase
MascotAdapter
XTandemAdapter
MSGFPlusAdapter
PeptideIndexer
FalseDiscoveryRate
IDPosteriorErrorProbability
ConsensusID
LuciphorAdapter
HighResPrecursorMassCorrector
FidoAdapter
tools for quantification
PeakPickerHiRes
FeatureFinderMultiplex
FeatureFinderCentroided
SpectraMerger
NoiseFilterSGolay
ITRAQAnalyzer
IDMapper
IDConflictResolver
MapAlignerPoseClustering
MapRTTransformer
FeatureLinkerUnlabeledQT
ProteinQuantifier
tools for file handling
FileConverter
FileMerger
FileFilter
IDFileConverter
IDMerger
IDFilter
MzTabExporter
FileInfo
OpenMS ⇨ collection of 180 software tools
≈ 30 tools sufficient for standard workflows
6. Proteomics Bioinformatics
EMBL-EBI, December 2016
OpenMS
OpenMS – an open-source framework for computational mass spectrometry
Portable: available on Windows, OSX, Linux
OpenMS TOPP tools – The OpenMS Proteomics Pipeline tools
• > 180 Building blocks: One application for each analysis step
• Vendor independent: Uses PSI standard formats
Can be integrated in various workflow systems
• TOPPAS – TOPP Pipeline Assistant
• Galaxy
• KNIME
7. Proteomics Bioinformatics
EMBL-EBI, December 2016
KNIME and TOPPView
KNIME – KoNstanz Information MinEr
• Enable to build customized workflows by using OpenMS
components.
TOPVIEW: An OpenMS Data Viewer.
• Based on standard files formats.
• MS/MS information,
peptides/proteins,
quantitative information.
8. Proteomics Bioinformatics
EMBL-EBI, December 2016
KNIME – Workflow System
KNIME – KoNstanz Information MinEr
Industrial-strength general-purpose workflow system
Convenient and easy-to-use graphical user interface
Available for Windows, OSX, Linux at http://KNIME.org
KNIME (CC BY-SA 4.0)
Workflows
Plots
Tables
Console
Nodes
9. Proteomics Bioinformatics
EMBL-EBI, December 2016
Workflow Builder: Data Flow
KNIME-OpenMS workflows consist of distinct nodes
that are assembled into workflows
Either tables or files are exchanged between nodes
along the edges of the workflow
Configuration dialogs are used to set node
parameters
Loops, allow iterating sequentially over lists of data
Switches, allow executing nodes or subworkflows
dependent on a condition
10. Proteomics Bioinformatics
EMBL-EBI, December 2016
Scripting
KNIME permits the embedding of R code for advanced statistics
Embedding of R scripts using the R Snippet node
All plotting capabilities of R can be used as well
11. Proteomics Bioinformatics
EMBL-EBI, December 2016
Peptide/Protein Identification
Task: Identify peptides in multiple samples
Mass spectra enter workflow on the left
Loop nodes permit execution of parts of the workflow
Identified proteins end up in result files (right side)
13. Proteomics Bioinformatics
EMBL-EBI, December 2016
Workflow – Plug-In System
Task: Identify peptides in multiple samples
Mass spectra enter workflow on the left
Loop nodes permit execution of parts of the workflow
Identified proteins end up in result files (right side)
14. Proteomics Bioinformatics
EMBL-EBI, December 2016
Workflow – Plug-In System
Task: Identify peptides in multiple samples
Combination of Xtandem+OMSSA
Defining of QC parameters like FDR. Q-values, P-values.
16. Proteomics Bioinformatics
EMBL-EBI, December 2016
Some of the Identification nodes
IDPosteriorErrorProbability
Compute the posterior error probability for each PSM
Generate a new file with the corresponding values.
ConsensusID
Combine PSM identifications from multiple search
engines.
Generate a Combined PosteriorErrorProbability for
each PSM.
For each peptide ID, use the best score of any
search engine as the consensus score.
FalseDiscoveryRate
For each peptide ID, use the best score of any
search engine as the consensus score.
17. Proteomics Bioinformatics
EMBL-EBI, December 2016
Adapters and Complementary Nodes
FileMerger
This nodes takes two files (or file lists) as input and
outputs a merged list of both inputs. The order
corresponds to the order of the input lists and ports.
IDMerger
Merges several protein/peptide identification files
into one file.
PeptideIndexer
Refreshes the protein references for all peptide hits.
IDFilter
Filters results from protein or peptide identification
engines based on different criteria.
18. Proteomics Bioinformatics
EMBL-EBI, December 2016
Quantitative Proteomics
Quantitative Proteomics
Relative Quantification
Labeled
In vivo
14N/15N SILAC
In vitro
iTRAQ TMT 16O/18O
Label-Free
Spectral Counting MRM Feature-Based
Absolute Quantification
AQUA SISCAPA
And many more…
19. Proteomics Bioinformatics
EMBL-EBI, December 2016
Label-Free Quantification (LFQ)
Label-free quantification is probably the most natural way of
quantifying
• No labeling required, removing further sources of error, cheap
• Different samples acquired in different measurements – higher
reproducibility needed
• Manual analysis difficult
• Scales very well with the number of samples, basically no limit,
no difference in the analysis between 2 or 100 samples
20. Proteomics Bioinformatics
EMBL-EBI, December 2016
Feature-based LFQ - LC-MS Maps
Spectra are acquired with rates up to dozens per second
Stacking the spectra yields peak maps
Resolution:
• Up to millions of points per spectrum
• Tens of thousands of spectra per LC run
Huge 2D datasets of up to hundreds of GB per sample
Quantification
(3x over-expressed, …)
Feature
(eluting peptide)
21. Proteomics Bioinformatics
EMBL-EBI, December 2016
Feature-based LFQ
1. Find features in all maps
2. Align maps
3. Link corresponding features
4. Identify features
5. Quantify features
6. Quantify proteins based on
their peptides
NPC2_HUMA
N
1.0 : 5.2 : 0.3
CD177_HUMAN 1.0 : 0.2 : 0.4
::
Sample 1 Sample 2 Sample 3
22. Proteomics Bioinformatics
EMBL-EBI, December 2016
Label-Free Workflow
Different algorithms has been proposed by the OpenMS community for
label free:
• Weisser H, Journal of Proteome Research (2013).
• Bo Zhang, Molecular Cell Proteomics (2016).
• Veit J., Jounral of Proteome Research (2016)
• Ranninger C., Analytica Chimica Acta (2016)
25. Proteomics Bioinformatics
EMBL-EBI, December 2016
LFQ Relevant nodes
FeatureFinderCentroid
Detects two-dimensional features in LC-MS data.
MapAlignerPoseClustering
Corrects retention time distortions between maps
using a pose clustering approach.
FeatureLinkerUnlabeledQT
Groups corresponding features from multiple maps.
ConsensusMapNormalizer
Normalizes maps of one consensusXML file
26. Proteomics Bioinformatics
EMBL-EBI, December 2016
OpenMS at Large Scale
Galaxy
WS-PGRADE/gUSE
KNIME
Each individual tool can be run in the command line making
possible its distribution in large HPC environments.
$> FileFilter -in myinfile.mzML -levels 2 -rt 100:1500 -out myoutfile.mzML
$> OpenSwathDecoyGenerator.exe −in OpenSWATH_SGS_AssayLibrary.TraML −out
OpenSWATH_SGS_AssayLibrary_with_Decoys.TraML −method shuffle −append exclude_similar
−remove_unannotated
27. Conclusions
• OpenMS modular workflow system
• standard workflows:
SILAC, iTRAQ/TMT, label-free, Swath, Quality
Control
• strong collaboration with other projects:
ProteoWizard, Thermo PD, Knime, Fido
Percolator, search engines, HUPO-PSI formats
28. How to run OpenMS workflows
• OpenMS, local installation
(Windows, OS X, Linux)
http://bit.ly/1J6lz6h
http://openms.de/workflows
• OpenMS in Proteome Discoverer
(LFQProfiler and RNPxl for PD 2.1)
http://openms.de/PD
• OpenMS in Galaxy
http://galaxy.uni-freiburg.de
• OpenMS in Knime
https://tech.knime.org/community/bioinf/openms