SlideShare ist ein Scribd-Unternehmen logo
1 von 22
PeptideProphet Explained Brian C. Searle Proteome Software Inc. www.proteomesoftware.com 1336 SW Bertha Blvd, Portland OR 97219 (503) 244-6027 An explanation of the Peptide Prophet algorithm developed by Keller, A., Nesvizhskii, A. I.,  Kolker, E., and Aebersold, R. (2002)  Anal. Chem.   74,  5383 –5392
Threshold model spectrum scores protein peptide sort by match score Before PeptideProphet was developed, a threshold model was the standard way of evaluating the peptides matched by a search of MS/MS spectra against a protein database.  The threshold model sorts search results by a match score.
Set some threshold spectrum scores protein peptide sort by match score SEQUEST XCorr > 2.5 dCn > 0.1 Mascot Score > 45 X!Tandem Score < 0.01 Next, a threshold value was set.  Different programs have different scoring schemes, so SEQUEST, Mascot, and X!Tandem use different thresholds.  Different thresholds may also be needed for different charge states, sample complexity, and database size.
Below threshold matches dropped spectrum scores protein peptide sort by match score SEQUEST XCorr > 2.5 dCn > 0.1 Mascot Score > 45 X!Tandem Score < 0.01 “ correct” “ incorrect” Peptides that are identified with scores above the threshold are considered “correct” matches.  Those with scores below the threshold are considered “incorrect”. There is no gray area where something is possibly correct.
There has to be a better way ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The threshold model has these problems, which PeptideProphet tries to solve:
Creating a discriminant score spectrum scores protein peptide sort by match score PeptideProphet starts with a discriminant score.  If an application uses several scores, (SEQUEST uses Xcorr,   Cn, and Sp scores; Mascot uses ion scores plus identity and homology thresholds), these are first converted to a single discriminant score.
Discriminant score for SEQUEST SEQUEST’s  XCorr  (correlation score) is corrected for length of the peptide. High correlation is rewarded. SEQUEST’s   Cn  tells how far the top score is from the rest. Being far ahead of others is rewarded. The top ranked by SEQUEST’s  Sp  score has  ln(rankSp) =0. Lower ranked scores are penalized. Poor mass accuracy (big   Mass ) is also penalized. For example, here’s the formula to combine SEQUEST’s scores into a discriminant score: ln( XCorr ) 8.4* ln(# AAs ) 7.4* 0.2*ln( rankSp ) 0.3* 0.96  Cn D  Mass                   
Histogram of scores Discriminant score (D) Number of spectra in each bin Once Peptide Prophet calculates the discriminant scores for all the spectra in a sample, it makes a histogram of these discriminant scores. For example, in the sample shown here, 70 spectra have scores around 2.5.
Mixture of distributions “ correct” “ incorrect” Discriminant score (D) Number of spectra in each bin This histogram shows the distributions of correct and incorrect matches.  PeptideProphet assumes that these distributions are standard statistical distributions. Using curve-fitting, PeptideProphet draws the correct and incorrect distributions.
Bayesian statistics “ correct” “ incorrect” Discriminant score (D) Number of spectra in each bin Once correct and incorrect distributions are drawn, PeptideProphet uses Bayesian statistics to compute the probability  p(+|D)  that a match is correct, given a discriminant score  D .
Probability of a correct match “ correct” “ incorrect” Discriminant score (D) Number of spectra in each bin The statistical formula looks fierce, but relating it to the histogram shows that the prob of a score of 2.5 being correct is
PeptideProphet model is accurate All Technical Replicates Together (large) Individual Samples (small) Keller et al., Anal Chem 2002 Keller, et al. checked PeptideProphet on a control data set for which they knew the right answer. Ideally, the PeptideProphet-computed probability should be identical to the actual probability, corresponding to a 45-degree line on this graph. They tested PeptideProphet with both large and small data sets and found pretty good agreement with the real probability. Since it was published, the Institute for Systems Biology has used PeptideProphet on a number of protein samples of varying complexity.
PeptideProphet  more sensitive than threshold model correctly identifies everything, with  no error Keller et al, Anal Chem 2002 This graph shows the trade-offs between the errors (false identifications) and the sensitivity (the percentage of possible peptides identified). The ideal is zero error and everything identified (sensitivity = 100%).  PeptideProphet corresponds to the curved line. Squares 1–5 are thresholds chosen by other authors.
PeptideProphet compared to Sequest Xcorr cutoff of 2 XCorr>2 dCn>0.1 NTT=2 Keller et al., Anal Chem 2002 correctly identifies everything, with  no error For example, for a threshold of Xcorr > 2 and   Cn>.1 with only fully tryptic peptides allowed (see square 5 on the graph), Sequest’s error rate is only 2%. However, its sensitivity is only 0.6 — that is, only 60% of the spectra are identified. Using PeptideProphet, the same 2% error rate identifies 90% of the spectra, because the discriminant score is tuned to provide better results.
Peptide Prophet compared to charge-dependent cutoff +2 XCorr>2 +3 XCorr>2.5 dCn>0.1 NTT>=1 Keller et al., Anal Chem 2002 correctly identifies everything, with  no error Another example uses a different threshold for charge +2 and charge +3 spectra (see square 2 on the graph).  For this threshold, the error rate is 8% and the sensitivity is 80%. At an error rate of 8%, PeptideProphet identifies 95% of the peptides.
PeptideProphet allows you to choose an error rate Keller et al., Anal Chem 2002 correctly identifies everything, with  no error A big advantage is that you can choose any error rate you like, such as 5% for inclusive searches, or 1% for extremely accurate searches.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],There has to be a better way Recall the problems that PeptideProphet was designed to fix.  How well did it do?
PeptideProphet better scores ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],discriminant score The discriminant score combines the various scores into one optimal score.
PeptideProphet better control of error rate ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],discriminant score estimate error with distributions The error vs. sensitivity curves derived from the distributions allow you to choose the error rate.
PeptideProphet better adaptability   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],discriminant score estimate error with distributions curve-fit distributions to data (EM) Each experiment has a different histogram of discriminant scores, to which the probability curves are automatically adapted.
PeptideProphet better reporting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],discriminant score estimate error with distributions curve-fit distributions to data (EM) report P-values Because results are reported as probabilities, you can compare different programs, samples, and experiments.
PeptideProphet Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Ähnlich wie Explaining Peptide Prophet

Exponential software reliability using SPRT: MLE
Exponential software reliability using SPRT: MLEExponential software reliability using SPRT: MLE
Exponential software reliability using SPRT: MLEIOSR Journals
 
Data Analysis Of An Analytical Method Transfer To
Data Analysis Of An Analytical Method Transfer ToData Analysis Of An Analytical Method Transfer To
Data Analysis Of An Analytical Method Transfer ToDwayne Neal
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classificationSnehaDey21
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Pptbarthriley
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionJoachim Jacob
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018Philip Ramsey
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATADETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATAIJCSEA Journal
 
Application of Genetic Algorithm in Software Testing
Application of Genetic Algorithm in Software TestingApplication of Genetic Algorithm in Software Testing
Application of Genetic Algorithm in Software TestingGhanshyam Yadav
 
Tests for one-sample_sensitivity_and_specificity
Tests for one-sample_sensitivity_and_specificityTests for one-sample_sensitivity_and_specificity
Tests for one-sample_sensitivity_and_specificityTulio Batista Veras
 
High Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant DetectionHigh Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant DetectionThermo Fisher Scientific
 
Webinar slides how to reduce sample size ethically and responsibly
Webinar slides   how to reduce sample size ethically and responsiblyWebinar slides   how to reduce sample size ethically and responsibly
Webinar slides how to reduce sample size ethically and responsiblynQuery
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptShayanChowdary
 

Ähnlich wie Explaining Peptide Prophet (20)

Exponential software reliability using SPRT: MLE
Exponential software reliability using SPRT: MLEExponential software reliability using SPRT: MLE
Exponential software reliability using SPRT: MLE
 
Data Analysis Of An Analytical Method Transfer To
Data Analysis Of An Analytical Method Transfer ToData Analysis Of An Analytical Method Transfer To
Data Analysis Of An Analytical Method Transfer To
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
Sh rn awhitepaper
Sh rn awhitepaperSh rn awhitepaper
Sh rn awhitepaper
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expression
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Cb36469472
Cb36469472Cb36469472
Cb36469472
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
JPR2010_TDMB
JPR2010_TDMBJPR2010_TDMB
JPR2010_TDMB
 
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATADETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
 
(Spring 2013) Signature Authentication Consistency Across Devices
(Spring 2013) Signature Authentication Consistency Across Devices(Spring 2013) Signature Authentication Consistency Across Devices
(Spring 2013) Signature Authentication Consistency Across Devices
 
Application of Genetic Algorithm in Software Testing
Application of Genetic Algorithm in Software TestingApplication of Genetic Algorithm in Software Testing
Application of Genetic Algorithm in Software Testing
 
Tests for one-sample_sensitivity_and_specificity
Tests for one-sample_sensitivity_and_specificityTests for one-sample_sensitivity_and_specificity
Tests for one-sample_sensitivity_and_specificity
 
High Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant DetectionHigh Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant Detection
 
Webinar slides how to reduce sample size ethically and responsibly
Webinar slides   how to reduce sample size ethically and responsiblyWebinar slides   how to reduce sample size ethically and responsibly
Webinar slides how to reduce sample size ethically and responsibly
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Kürzlich hochgeladen (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Explaining Peptide Prophet

  • 1. PeptideProphet Explained Brian C. Searle Proteome Software Inc. www.proteomesoftware.com 1336 SW Bertha Blvd, Portland OR 97219 (503) 244-6027 An explanation of the Peptide Prophet algorithm developed by Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002) Anal. Chem. 74, 5383 –5392
  • 2. Threshold model spectrum scores protein peptide sort by match score Before PeptideProphet was developed, a threshold model was the standard way of evaluating the peptides matched by a search of MS/MS spectra against a protein database. The threshold model sorts search results by a match score.
  • 3. Set some threshold spectrum scores protein peptide sort by match score SEQUEST XCorr > 2.5 dCn > 0.1 Mascot Score > 45 X!Tandem Score < 0.01 Next, a threshold value was set. Different programs have different scoring schemes, so SEQUEST, Mascot, and X!Tandem use different thresholds. Different thresholds may also be needed for different charge states, sample complexity, and database size.
  • 4. Below threshold matches dropped spectrum scores protein peptide sort by match score SEQUEST XCorr > 2.5 dCn > 0.1 Mascot Score > 45 X!Tandem Score < 0.01 “ correct” “ incorrect” Peptides that are identified with scores above the threshold are considered “correct” matches. Those with scores below the threshold are considered “incorrect”. There is no gray area where something is possibly correct.
  • 5.
  • 6. Creating a discriminant score spectrum scores protein peptide sort by match score PeptideProphet starts with a discriminant score. If an application uses several scores, (SEQUEST uses Xcorr,  Cn, and Sp scores; Mascot uses ion scores plus identity and homology thresholds), these are first converted to a single discriminant score.
  • 7. Discriminant score for SEQUEST SEQUEST’s XCorr (correlation score) is corrected for length of the peptide. High correlation is rewarded. SEQUEST’s  Cn tells how far the top score is from the rest. Being far ahead of others is rewarded. The top ranked by SEQUEST’s Sp score has ln(rankSp) =0. Lower ranked scores are penalized. Poor mass accuracy (big  Mass ) is also penalized. For example, here’s the formula to combine SEQUEST’s scores into a discriminant score: ln( XCorr ) 8.4* ln(# AAs ) 7.4* 0.2*ln( rankSp ) 0.3* 0.96  Cn D  Mass                   
  • 8. Histogram of scores Discriminant score (D) Number of spectra in each bin Once Peptide Prophet calculates the discriminant scores for all the spectra in a sample, it makes a histogram of these discriminant scores. For example, in the sample shown here, 70 spectra have scores around 2.5.
  • 9. Mixture of distributions “ correct” “ incorrect” Discriminant score (D) Number of spectra in each bin This histogram shows the distributions of correct and incorrect matches. PeptideProphet assumes that these distributions are standard statistical distributions. Using curve-fitting, PeptideProphet draws the correct and incorrect distributions.
  • 10. Bayesian statistics “ correct” “ incorrect” Discriminant score (D) Number of spectra in each bin Once correct and incorrect distributions are drawn, PeptideProphet uses Bayesian statistics to compute the probability p(+|D) that a match is correct, given a discriminant score D .
  • 11. Probability of a correct match “ correct” “ incorrect” Discriminant score (D) Number of spectra in each bin The statistical formula looks fierce, but relating it to the histogram shows that the prob of a score of 2.5 being correct is
  • 12. PeptideProphet model is accurate All Technical Replicates Together (large) Individual Samples (small) Keller et al., Anal Chem 2002 Keller, et al. checked PeptideProphet on a control data set for which they knew the right answer. Ideally, the PeptideProphet-computed probability should be identical to the actual probability, corresponding to a 45-degree line on this graph. They tested PeptideProphet with both large and small data sets and found pretty good agreement with the real probability. Since it was published, the Institute for Systems Biology has used PeptideProphet on a number of protein samples of varying complexity.
  • 13. PeptideProphet more sensitive than threshold model correctly identifies everything, with no error Keller et al, Anal Chem 2002 This graph shows the trade-offs between the errors (false identifications) and the sensitivity (the percentage of possible peptides identified). The ideal is zero error and everything identified (sensitivity = 100%). PeptideProphet corresponds to the curved line. Squares 1–5 are thresholds chosen by other authors.
  • 14. PeptideProphet compared to Sequest Xcorr cutoff of 2 XCorr>2 dCn>0.1 NTT=2 Keller et al., Anal Chem 2002 correctly identifies everything, with no error For example, for a threshold of Xcorr > 2 and  Cn>.1 with only fully tryptic peptides allowed (see square 5 on the graph), Sequest’s error rate is only 2%. However, its sensitivity is only 0.6 — that is, only 60% of the spectra are identified. Using PeptideProphet, the same 2% error rate identifies 90% of the spectra, because the discriminant score is tuned to provide better results.
  • 15. Peptide Prophet compared to charge-dependent cutoff +2 XCorr>2 +3 XCorr>2.5 dCn>0.1 NTT>=1 Keller et al., Anal Chem 2002 correctly identifies everything, with no error Another example uses a different threshold for charge +2 and charge +3 spectra (see square 2 on the graph). For this threshold, the error rate is 8% and the sensitivity is 80%. At an error rate of 8%, PeptideProphet identifies 95% of the peptides.
  • 16. PeptideProphet allows you to choose an error rate Keller et al., Anal Chem 2002 correctly identifies everything, with no error A big advantage is that you can choose any error rate you like, such as 5% for inclusive searches, or 1% for extremely accurate searches.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.