OpenMS: Quantitative Proteomics Tools and Workflows

Yasset Perez-Riverol Ph.D
github: github.com/ypriverol
twitter: @ypriverol
OpenMS: Quantitative proteomics at
large scale

Proteomics Bioinformatics
EMBL-EBI, December 2016
Outline
• Introduction to OpenMS
Modularity & Workflows
Visualization.
Integration with other tools.
• Two example workflows
Protein identification
Label-free quantification

Modularity is the degree to which a system's components may
be separated and recombined.

Modularity
tools for identification
DecoyDatabase
MascotAdapter
XTandemAdapter
MSGFPlusAdapter
PeptideIndexer
FalseDiscoveryRate
IDPosteriorErrorProbability
ConsensusID
LuciphorAdapter
HighResPrecursorMassCorrector
FidoAdapter
tools for quantification
PeakPickerHiRes
FeatureFinderMultiplex
FeatureFinderCentroided
SpectraMerger
NoiseFilterSGolay
ITRAQAnalyzer
IDMapper
IDConflictResolver
MapAlignerPoseClustering
MapRTTransformer
FeatureLinkerUnlabeledQT
ProteinQuantifier
tools for file handling
FileConverter
FileMerger
FileFilter
IDFileConverter
IDMerger
IDFilter
MzTabExporter
FileInfo
OpenMS ⇨ collection of 180 software tools
≈ 30 tools sufficient for standard workflows

OpenMS
OpenMS – an open-source framework for computational mass spectrometry
Portable: available on Windows, OSX, Linux
OpenMS TOPP tools – The OpenMS Proteomics Pipeline tools
• > 180 Building blocks: One application for each analysis step
• Vendor independent: Uses PSI standard formats
Can be integrated in various workflow systems
• TOPPAS – TOPP Pipeline Assistant
• Galaxy
• KNIME

KNIME and TOPPView
KNIME – KoNstanz Information MinEr
• Enable to build customized workflows by using OpenMS
components.
TOPVIEW: An OpenMS Data Viewer.
• Based on standard files formats.
• MS/MS information,
peptides/proteins,
quantitative information.

KNIME – Workflow System
KNIME – KoNstanz Information MinEr
Industrial-strength general-purpose workflow system
Convenient and easy-to-use graphical user interface
Available for Windows, OSX, Linux at http://KNIME.org
KNIME (CC BY-SA 4.0)
Workflows
Plots
Tables
Console
Nodes

Workflow Builder: Data Flow
KNIME-OpenMS workflows consist of distinct nodes
that are assembled into workflows
Either tables or files are exchanged between nodes
along the edges of the workflow
Configuration dialogs are used to set node
parameters
Loops, allow iterating sequentially over lists of data
Switches, allow executing nodes or subworkflows
dependent on a condition

Scripting
KNIME permits the embedding of R code for advanced statistics
Embedding of R scripts using the R Snippet node
All plotting capabilities of R can be used as well

Peptide/Protein Identification
Task: Identify peptides in multiple samples
Mass spectra enter workflow on the left
Loop nodes permit execution of parts of the workflow
Identified proteins end up in result files (right side)

TOOView: Visualization of the results
mzML idXML

Workflow – Plug-In System
Mass spectra enter workflow on the left
Loop nodes permit execution of parts of the workflow
Identified proteins end up in result files (right side)

Workflow – Plug-In System
Combination of Xtandem+OMSSA
Defining of QC parameters like FDR. Q-values, P-values.

Complex and customized Workflows
X!Tandem Mascot MS-GF+ Merged
PIA 1214 64 (5.3%) 1442 74 (5.1%) 1631 93 (5.7%) 1615 101 (6.2%)
Fido 996 67 (6.7%) 1439 80 (5.6%) 1679 96 (5.7%) 1619 105 (6.5%)
ProteinLP 989 64 (6.5%) 1229 77 (2.3%) 1651 93 (5.6%) 1295 104 (8.0%)
MSBayesPro 749 24 (3.2%) 958 26 (2.7%) 1303 31 (2.4%) 963 36 (3.7%)
ProteinProphet 1027 64 (6.2%) 1282 73 (5.7%) 1629 91 (5.6%) 1629 99 (6.7%)
Audain E. & Uszkoreit J. et al, Journal of Proteomics, 2017
Best Protein inference
algorithm:
3 Datasets
4 Search engines.
5 Protein inference
algorithms.
> 140 combinations.

Some of the Identification nodes
IDPosteriorErrorProbability
Compute the posterior error probability for each PSM
Generate a new file with the corresponding values.
ConsensusID
Combine PSM identifications from multiple search
engines.
Generate a Combined PosteriorErrorProbability for
each PSM.
For each peptide ID, use the best score of any
search engine as the consensus score.
FalseDiscoveryRate
For each peptide ID, use the best score of any
search engine as the consensus score.

Adapters and Complementary Nodes
FileMerger
This nodes takes two files (or file lists) as input and
outputs a merged list of both inputs. The order
corresponds to the order of the input lists and ports.
IDMerger
Merges several protein/peptide identification files
into one file.
PeptideIndexer
Refreshes the protein references for all peptide hits.
IDFilter
Filters results from protein or peptide identification
engines based on different criteria.

Quantitative Proteomics
Quantitative Proteomics
Relative Quantification
Labeled
In vivo
14N/15N SILAC
In vitro
iTRAQ TMT 16O/18O
Label-Free
Spectral Counting MRM Feature-Based
Absolute Quantification
AQUA SISCAPA
And many more…

Label-Free Quantification (LFQ)
Label-free quantification is probably the most natural way of
quantifying
• No labeling required, removing further sources of error, cheap
• Different samples acquired in different measurements – higher
reproducibility needed
• Manual analysis difficult
• Scales very well with the number of samples, basically no limit,
no difference in the analysis between 2 or 100 samples

Feature-based LFQ - LC-MS Maps
Spectra are acquired with rates up to dozens per second
Stacking the spectra yields peak maps
Resolution:
• Up to millions of points per spectrum
• Tens of thousands of spectra per LC run
Huge 2D datasets of up to hundreds of GB per sample
Quantification
(3x over-expressed, …)
Feature
(eluting peptide)

Feature-based LFQ
1. Find features in all maps
2. Align maps
3. Link corresponding features
4. Identify features
5. Quantify features
6. Quantify proteins based on
their peptides
NPC2_HUMA
N
1.0 : 5.2 : 0.3
CD177_HUMAN 1.0 : 0.2 : 0.4
::
Sample 1 Sample 2 Sample 3

Label-Free Workflow
Different algorithms has been proposed by the OpenMS community for
label free:
• Weisser H, Journal of Proteome Research (2013).
• Bo Zhang, Molecular Cell Proteomics (2016).
• Veit J., Jounral of Proteome Research (2016)
• Ranninger C., Analytica Chimica Acta (2016)

DeMix-Q Algorithm and Workflow
Bo Zhang, Lukas Käll & Roman A. Zubarev, MCP (2016)

Reliable and reproducible Quantitation

LFQ Relevant nodes
FeatureFinderCentroid
Detects two-dimensional features in LC-MS data.
MapAlignerPoseClustering
Corrects retention time distortions between maps
using a pose clustering approach.
FeatureLinkerUnlabeledQT
Groups corresponding features from multiple maps.
ConsensusMapNormalizer
Normalizes maps of one consensusXML file

OpenMS at Large Scale
Galaxy
WS-PGRADE/gUSE
KNIME
Each individual tool can be run in the command line making
possible its distribution in large HPC environments.
$> FileFilter -in myinfile.mzML -levels 2 -rt 100:1500 -out myoutfile.mzML
$> OpenSwathDecoyGenerator.exe −in OpenSWATH_SGS_AssayLibrary.TraML −out
OpenSWATH_SGS_AssayLibrary_with_Decoys.TraML −method shuffle −append exclude_similar
−remove_unannotated

Conclusions
• OpenMS modular workflow system
• standard workflows:
SILAC, iTRAQ/TMT, label-free, Swath, Quality
Control
• strong collaboration with other projects:
ProteoWizard, Thermo PD, Knime, Fido
Percolator, search engines, HUPO-PSI formats

How to run OpenMS workflows
• OpenMS, local installation
(Windows, OS X, Linux)
http://bit.ly/1J6lz6h
http://openms.de/workflows
• OpenMS in Proteome Discoverer
(LFQProfiler and RNPxl for PD 2.1)
http://openms.de/PD
• OpenMS in Galaxy
http://galaxy.uni-freiburg.de
• OpenMS in Knime
https://tech.knime.org/community/bioinf/openms

OpenMS: Quantitative Proteomics Tools and Workflows

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie OpenMS: Quantitative Proteomics Tools and Workflows

Ähnlich wie OpenMS: Quantitative Proteomics Tools and Workflows (20)

Mehr von Yasset Perez-Riverol

Mehr von Yasset Perez-Riverol (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

OpenMS: Quantitative Proteomics Tools and Workflows