SlideShare ist ein Scribd-Unternehmen logo
1 von 33
CNCP 2010
Guangchuang Yu
Jinan University
2010.11.19
Beijing 2010.11.10-11
Overview
• Fragmentation
– 孙瑞祥
• Labeling Strategy
– 陆豪杰
• De Novo Sequnceing
– 董梦秋 马斌 王全会 张凯中
• Identification
– 余维川 付岩 叶明亮
• Label free semi-quantitation
– 邓宁
• Database Construction
– 邵晨 杨芃原
• Data Quality Control
– 朱云平
Overview
• Data Processing Platform
– 关慎恒 盛泉虎
• Glycoproteomics
– 杨芃原 应万涛 张凯中
• Proteogenomics
– 谢鹭 赵屹
• Biological Problem oriented
– 汪迎春 王通 徐平
• Protein Structure
– 张法 卜东坡
• Others
– 江瑞 张勇 张红雨
Fragmentation
孙瑞祥
ICT
Electron Transfer Dissociation: Characterization and
Applications in Protein Identification
2010. Improved Peptide Identification for Proteomic Analysis Based on Comprehensive Characterization of
Electron Transfer Dissociation Spectra. J Proteome Res.
Important spectral characteristics of ETD are ignored or underutilized in
popular database search algorithms, such as Mascot, Sequest, OMSSA, OR X!
TANDEM
Analyzed 461,440 spectra to find ETD characterization
distinct hydrogen rearrangement patterns of +2, +3 and +4 precursors
charge-reduced precursor ions and associated neutral loss peaks
pFind identified 63-122% more unique peptides than Mascot for doubly
charged precursors at 1% FDR cutoff.
Labeling Strategy
陆豪杰
Fudan Uinv
In vivo termini amino acid labeling for quantitative proteomics
Cover 93% proteins deposited in Uniprot.
More accuracy for identification and quantification.
Dual digest by Arg-C & Lys-N (increase sample complexity)
De Novo Sequencing
2010. pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra. Journal of Proteome
Research 9:2713-2724.
董梦秋
NIBS
De novo Sequencing of Peptides using HCD Spectra
HCD produces high mass accuracy tandem mass spectra, the majority
of which contain complete ion series. Besides, abundant internal and
immonium ions in the HCD spectra can help differentiate between
similar sequences.
Ascaris suum sperm crawling
related proteins
pNovo
Identify peptide sequences
Blast
Homologs of C. elegans
Design primer for validation
De Novo Sequencing
马斌
U of Waterloo
Complete Homology-Assisted MS/MS Protein
Sequencing (CHAMPS)
2009. Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage
and accuracy. Bioinformatics 25:2174 -2180.
Novel protein
SPIDER
Homologous sequenceDe novo sequences
CHAMPS
Complete protein sequence
(above 99% coverage and 100% accuracy for two standard proteins)
De Novo Sequencing
王全会
BIG
From an unknown genome to a measurable proteome: Studying
on the pH-dependent proteomes in N10 bacteria by de novo
sequencing
2009. Exploring membrane and cytoplasm proteomic responses of Alkalimonas amylolytica N10 to different
external pHs with combination strategy of de novo peptide sequencing. Proteomics 9:1254-1273.
Tandem spectra with/without SPITC labeling
PEAKS for auto de novo Manually analyzed
Combine filtered data
Validation by PCR and Western blot
More than 70% of the differential 2-DE spots were identified
Identification
余维川
HKUST
Optimization-Based Peptide Mass Fingerprinting for Protein
Mixture Identification
2010. Optimization-based peptide mass fingerprinting for protein mixture identification. J. Comput. Biol 17:221-
235.
• PMF method has two inherent disadvantages:
– Originally designed for identifying single purified proteins rather than
protein mixtures
– Can’t distinguish different peptides with identical mass
• Heuristic algorithm
– Introduce a scoring function for protein mixture identification
– Local search algorithms for protein mixture identification
• External factors might be optimized to facilitate successful protein
mixture identification
– Mass accuracy
– Sequence coverage
– Noise level
– Protein number in the mixtures
Identification
付岩
ICT
Unrestrictive modification detection based on related spectral
pairs
2009. Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and
retention time differences. BMC Bioinformatics 10 (Suppl 1):S50.
• The majority of mass spectra cannot be interpreted at present
– Unexpected or unknown protein PTM
• Detect abundant PTM in high-accuracy peptide mass spectra
– Efficient and sequence database-independent approach
– Based on the observation that the spectra of a modified peptide and its unmodified
counterpart are correlated with each other in their peptide masses and retention time
– Frequently occurring peptide mass differences imply possible modifications
– Small and consistent retention time differences provide orthogonal supporting
evidence
– Use a bivariate Gaussian mixture model to discriminate modification-related spectral
pairs from random ones
• Results
– Experiments on two glycoprotein data sets demonstrate that the method can
effectively detect abundant modifications and spectral pairs.
– By including the discovered modifications into database search, an average of 10%
more spectra are interpreted
Identification
叶明亮
DICP
Development of Methods and Platform for Data Processing in
Mass Spectrometry Based Proteome Research
PMID: 17761002/19551949/18314942/20568719/19522514/20334362
• Un-modified peptide identification
– Implemented a predictive genetic algorithm for optimization of filtering criteria to
maximize the number of identified peptides at fixed FDR for SEQUEST
– Introduced an approach for calculating posterior probability of individual peptide
identification from the “local FDR” by using k nearest neighbors algorithm and Shannon
information entropy
• Phosphopeptide identification
– Developed an automatic validation approach for phosphopeptide identification by
combining consecutive stage MS data and the target-decoy database searching strategy
– Developed a classification filtering strategy to improve the phosphopeptide
identification and phosphorylation site localization
– Proposed a modified target-decoy database search strategy for confident
phosphorylation site analysis of individual phosphoproteins without manual
interpretation of spectra
– Developed a software ArMone for processing and analysis of phosphoproteome data
Label free semi-quantitation
邓宁
ZJU
Quantitative Analysis of Mitochondrial Proteomes using
Normalized Spectral Abundance Factor
Samples:
 5 human cardiac mitochondrial samples
 8 murine cardiac mitochondrial samples
 7 murine liver mitochondrial samples
LC-MS/MS
Database search by
SEQUEST and statistically
validated by Scaffold
In-house software to generate NSAF
value for quantitative analysis
Results:
 Electron transport chain show
highest abundances , especially
in heart
 Metabolism related proteins
and urea cycle proteins show
more abundant in the liver
Database Construction
邵晨
PUMC
The urinary protein biomarker database
• Data collection
– Manual search in Pubmed
– Review by Students
• Database construction
• Basic analysis
– Compare different disease type
– Simple descriptive statistical analysis
– Construct disease-biomarker network and showing some basic
topological properties
Data Quality Control
朱云平
BPRC
A nonparametric model for quality control of database search
results in shotgun proteomics
2008. A nonparametric model for quality control of database search results in shotgun proteomics. BMC
Bioinformatics 9:29.
• Randomized database were used for
quality control
• Ignore to combine different database
search scores to improve the sensitivity
of randomized database methods
• A multivariate nonlinear discriminate
function (DF) based on the multivariate
nonparametric density estimation
technique was proposed to filter out
false-positive database search results
with a predictable FDR
Data Processing Platform
关慎恒
UCSF
A data processing platform for mammalian proteome dynamics
studies using stable isotope metabolic labeling
2010. Analysis of proteome dynamics in the mouse brain. Proceedings of the National Academy of Sciences
107:14508 -14513.
• Data processing platform
– Integrate a variety of software modules into a workflow
– Specifically developed for 15N metabolic labelling
 Cross-extraction of 15N-containing ion intensities from raw data files of varying biosynthetic
incorporation times
 Computation of peptide 15N incorporation distributions
 Aggregation of multiple peptide relative isotope abundance curves into a protein curve
– Processing parameter optimization and noise reduction procedures are performed in some
necessary processing modules to reduce the propagation errors in a long chain of the
processing steps
Data Processing Platform
盛泉虎
SIBS
BuildSummary: A software tool for assembling protein
• Maximize the number of confident proteins above a threshold of FDR
– By integrate results from different peptide search engines for the same dataset
• BuildSummary
– Allow user to combine many independent PSM (peptide-spectrum matches) scoring
algorithms including de novo sequencing and spectrum library search algorithms, if the same
peptide FDR is applied to each of them by using target-decoy search approach
Glycoproteomics
Mass spectrometry database for glycoprotein structures
2009. Identification of N-Glycosylation Sites on Secreted Proteins of Human Hepatocellular Carcinoma Cells with a
Complementary Proteomics Approach. Journal of Proteome Research 8:662-672.
杨芃原
Fudan Univ
• Enrichment
– Hydrophilic affinity enrichment
– PNGase-F release of N-glycan
• Results
– Identified 4000 spectra of intact N-glycopeptides at FDR of 1% in three
2DLC runs for serum sample
– 1500 different glycopeptides, corresponding to 250 glycosylation site,
were discovered
– Two separated high-confident databases for serum sample were
constructed:
 Naked glycopeptides (de-glycopeptides) database (523 peptides)
 N-glycan database (599 glycans)
– software GRIP were developed for interpretation of spectra from intact
glycopeptides
Glycoproteomics
应万涛
BPRC
Establishment of a systematic method coupling consecutive MSn
and
software tools for charactering core-fucosylated glycoproteins
2009. A Strategy for Precise and Large Scale Identification of Core Fucosylated Glycoproteins. Molecular & Cellular
Proteomics 8:913 -923.
• Strategy development
– Novel enrichment step
 Combining the use of lectin for CF glycoprotein
enrichment with ultrafiltration for further
enrichment of glycopeptide
– Established a neutral loss-dependent MS3
scan method that specifically captures
partially deglycosylated CF glycopeptides
– Established a novel database-independent
candidate spectrum-filtering method for
selecting partially deglycosylated CF
glycopeptides and a spectrum optimization
method
Glycoproteomics
张凯中
UWO
Glycan Structure Sequencing with Tandem Mass Spectrometry
2008. Complexities and algorithms for glycan sequencing using tandem mass spectrometry. J Bioinform Comput
Biol 6:77-91. 2009.
• Glycan de novo sequencing
– Glycan database is rather incomplete
– Determination of novel glycan structures requires de novo
sequencing
• Heuristic algorithm
– First generates many acceptable small subtrees, which are
then joined together in a repetitive process to obtain larger
and larger suboptimal subtress until reaching the desired
mass
– At each size of the subtree, only limited number of subtrees
are kept for later use
– Experiments on real MS/MS data showed that the heuristic
algorithm can be determine glycan structures
• Contribution
– A polynomial time algorithm is provided under a simple
model of glycan de novo sequencing
Proteogenomics
谢鹭
SIBS
The discovery of novel protein-coding features in mouse genome
based on mass spectrometry data
• Detect un-annotated protein-coding regions in mouse genome
– Two searchable proteomic database were constructed
 All possible encoded exon junctions (EJCT dataset) for the discovery of
novel exon splice events
 Putative encoded exons (ORF database) for finding uninterrupted novel
protein coding regions
– Two datasets were combined with a public full-length protein
dataset (competitive dataset) respectively and queried against
496 high-accuracy tandam MS RAW files from diverse mouse
samples
– 32 unique peptides (matching 149 spectra) from EJCT dataset
were discovered which straddle novel exon junctions
– 104 unique peptides (matching 450 spectra) from ORF dataset
were located in 99 unique protein-coding regions
Proteogenomics
赵屹
ICT
Proteogenomics analysis of Thermoanaerobacter
tengcongensis ( 腾冲嗜热菌 ) at different temperatures
• Genome
– Estimatd to encode 2588 theoretical proteins
• Annotating Genome
– By combining proteomics and transcriptomics
 Transcriptomic data cover above 70% of 2588 genes
 Above 74% of spectra were consistent with transcriptomic data
– Quantitative analysis of gene expression levels at 4 different
temperatures
 359 genes were commonly expressed
 Unique expressing genes were also detected in distinct temperatures
– 80 genes not belong to 2588 gene set
 2 coding regions were supported by MS
 21 coding regions may encode novel non-coding RNA
– The discovery was used to re-annotate 2588 gene set
Biological Problem oriented
汪迎春
IGDB
Deciphering the Signaling Network in the Leading Edge of the
Migrating Cells
2007. Profiling signaling polarity in chemotactic cells. Proceedings of the National Academy of Sciences 104:8328
-8333.
Characterization of the Ras/ERK Signaling Pathway in the
PD by Combined Proteome and Phosphoproteome Profiling
Biological Problem oriented
王通
JNU
Pathway analysis-assisted study strategy in functional
proteomics
2008. HIV-1 infected astrocytes and the microglial proteome. Journal of neuroimmune pharmacology 3:173-186.
• Biological Questions
– HIV associated neurodegenerative disorders (HAND)
– HIV associated malignancy (HAM)
– Infection and cancer
Biological Problem oriented
徐平
BRPC
Data analysis in large scale quantitative proteomics study with
SILAC approach
2009. Quantitative Proteomics Reveals the Function of Unconventional Ubiquitin Chains in Proteasomal
Degradation. Cell 137:133-145.
• Background
– K48-linked chains are mediators of
proteasomal degradation
– K6, K11, K27, K29 or K33 are not
well understood
• Results
– Identified K11 linkage-specific
substrates, including Ubc6, which
involved in ERAD pathway (ER
stress response)
Protein Structure
张法
ICT
Computational methods in cryo-electron microscopy: image data
processing and 3D structure reconstruction
2009. A framework to refine particle clusters produced by EMAN. Bioinformatics 12:i276-i280.
• EMAN
– One of the most popular software packages for
single particle reconstruction
• Particle reclustering framework (PRF)
– Normalization
– Threshold determination
– Reclustering
Data Analysis
卜东坡
ICT
Designing Succinct Structural Alphabets
2008. Designing succinct structural alphabets. Bioinformatics 24:i182 -i189.
• Fragment libraries
– A small amount of structural fragments can model protein structures accurately
– The library size and accuracy are dominating factors for modeling and predicting the protein
structures accurately
– A major bottleneck for the fragment-based protein structure prediction methods is designing
succinct and highly accurate structural alphabet
• Contributions
– Introducing structural information items, such as secondary structure, solvent accessibility
and contact capacity, can improve the prediction of structural fragments
– Derive the best combination of both sequence and structural information items, and
significantly reduce the structural alphabet size, at the same level of accuracy by using integer
linear programming
– Significantly improve the protein structure prediction, with all other conditions unchanged
• Scoring function for mapping a sequence segment to a structural fragment
– Consists of mutation score, secondary structure score, contact capacity score, and
environment fitness score.
– Using more scoring items to improve the performance is promising
Others
江瑞
Tsinghua
DomainRBF: a Bayesian regression approach to the prioritization
of associations between protein domains and human complex
diseases
2010. Prioritisation of associations between protein domains and complex diseases using domain-domain
interaction networks. Systems Biology, IET 4:212-222.
• DomainRBF (domain Rank with Bayes Factor)
– To prioritize association between candidate domains and human disease
– Ranking score based on ‘guilt-by-association’ principle, which relies on the
assumption that a disease is likely to be caused by a set of genes that have
similar properties
• Data sources
– Domain-disease associations
– Domain-domain interaction networks
• Validation
– Large-scale cross validation experiments on simulated linkage intervals,
random controls and the whole genome
– Results show that areas under ROC curves can be as high as 77.9%
Others
张红雨
HZAU
Proteins as molecular fossils
2010. A Universal Molecular Clock of Protein Folds and its Power in Tracing the Early History of Aerobic
Metabolism and Planet Oxygenation. Molecular Biology and Evolution.
• Proteins can also serve as molecular fossils
• Building phylogenies and timelines of
domains at fold and fold superfamily levels
of structural complexity
– Using a phylogenomic structural census in hundreds
of proteomes
– Correlate approximately linearly with geological
timescales
– Dissected the structures and functions of enzymes
in simulated metabolic networks
– The placement of anaerobic and aerobic enzymes in
the timeline revealed that aerobic metabolism
emerged ~2.9 billion years
Others
张勇
BGI Shenzhen
From NGS Genomics to MS-based Proteomics – BGI’s
bioinformatics activities
• Advertising from BGI Shenzhen
– Introduce BGI’s developmental progress
All slides will be available at
http://cncp2010.ict.ac.cn/
Phosphorylation
Kevan Shokat
UCSF
Kinase-specific phosphorylation analysis
2008. Covalent capture of kinase-specific phosphopeptides reveals Cdk1-cyclin B substrates.
Proceedings of the National Academy of Sciences 105:1442 -1447.
Phosphorylation
Kevan Shokat
UCSF
Kinase-specific phosphorylation analysis
2004. Design and use of analog-sensitive protein kinases. Curr Protoc Mol Biol Chapter 18:Unit 18.11.
The amino acid that must be changed to
construct –as kinase alleles can be most
easily identified using a freely available online
resource at http://kinase.ucsf.edu/ksd/.
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Role of genomics and proteomics
Role of genomics and proteomicsRole of genomics and proteomics
Role of genomics and proteomicsPavana K A
 
Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...naveed ul mushtaq
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserNeil Swainston
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vuploadProf. Wim Van Criekinge
 
Peptide Mass Fingerprinting (PMF) and Isotope Coded Affinity Tags (ICAT)
Peptide Mass Fingerprinting  (PMF) and Isotope Coded Affinity Tags (ICAT)Peptide Mass Fingerprinting  (PMF) and Isotope Coded Affinity Tags (ICAT)
Peptide Mass Fingerprinting (PMF) and Isotope Coded Affinity Tags (ICAT)Suresh Antre
 
German Russian Workshop 2011 - geneXplain
German Russian Workshop  2011 - geneXplainGerman Russian Workshop  2011 - geneXplain
German Russian Workshop 2011 - geneXplaingeneXplain GmbH
 
“Proteomics” to study genes and genomes
“Proteomics” to study genes and genomes“Proteomics” to study genes and genomes
“Proteomics” to study genes and genomesNazish_Nehal
 
Protein identification - peptide mass fingerprinting
Protein identification - peptide mass fingerprintingProtein identification - peptide mass fingerprinting
Protein identification - peptide mass fingerprintingCreative Proteomics
 
(050407)protein chip
(050407)protein chip(050407)protein chip
(050407)protein chipnamvgta
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekingeProf. Wim Van Criekinge
 

Was ist angesagt? (20)

Proteomics
ProteomicsProteomics
Proteomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
protein microarray
protein microarray protein microarray
protein microarray
 
Mascot database
Mascot databaseMascot database
Mascot database
 
MASCOT
MASCOTMASCOT
MASCOT
 
Role of genomics and proteomics
Role of genomics and proteomicsRole of genomics and proteomics
Role of genomics and proteomics
 
Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
 
Peptide Mass Fingerprinting (PMF) and Isotope Coded Affinity Tags (ICAT)
Peptide Mass Fingerprinting  (PMF) and Isotope Coded Affinity Tags (ICAT)Peptide Mass Fingerprinting  (PMF) and Isotope Coded Affinity Tags (ICAT)
Peptide Mass Fingerprinting (PMF) and Isotope Coded Affinity Tags (ICAT)
 
Salisha ppt (1) (1)
Salisha ppt (1) (1)Salisha ppt (1) (1)
Salisha ppt (1) (1)
 
German Russian Workshop 2011 - geneXplain
German Russian Workshop  2011 - geneXplainGerman Russian Workshop  2011 - geneXplain
German Russian Workshop 2011 - geneXplain
 
2011-NAR
2011-NAR2011-NAR
2011-NAR
 
“Proteomics” to study genes and genomes
“Proteomics” to study genes and genomes“Proteomics” to study genes and genomes
“Proteomics” to study genes and genomes
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Protein identification - peptide mass fingerprinting
Protein identification - peptide mass fingerprintingProtein identification - peptide mass fingerprinting
Protein identification - peptide mass fingerprinting
 
(050407)protein chip
(050407)protein chip(050407)protein chip
(050407)protein chip
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
 

Ähnlich wie Cncp 2010

Protein Qualitative Analysis Services
Protein Qualitative Analysis ServicesProtein Qualitative Analysis Services
Protein Qualitative Analysis ServicesCreative Proteomics
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Short story PPT
Short story PPTShort story PPT
Short story PPTHaleyFeng1
 
SMB 28112013 Alain van Gool - Technologiecentra Radboudumc
SMB 28112013 Alain van Gool - Technologiecentra RadboudumcSMB 28112013 Alain van Gool - Technologiecentra Radboudumc
SMB 28112013 Alain van Gool - Technologiecentra RadboudumcSMBBV
 
Methods for Protein Sequencing.pdf
Methods for Protein Sequencing.pdfMethods for Protein Sequencing.pdf
Methods for Protein Sequencing.pdfCreative Proteomics
 
Three Methods for Protein Sequencing
Three Methods for Protein SequencingThree Methods for Protein Sequencing
Three Methods for Protein SequencingCreative Proteomics
 
2013-11-28 Science meets Business, Nijmegen
2013-11-28 Science meets Business, Nijmegen2013-11-28 Science meets Business, Nijmegen
2013-11-28 Science meets Business, NijmegenAlain van Gool
 
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET
 
Presentationretrrgdgxxdbhhggvfcddxx.pptx
Presentationretrrgdgxxdbhhggvfcddxx.pptxPresentationretrrgdgxxdbhhggvfcddxx.pptx
Presentationretrrgdgxxdbhhggvfcddxx.pptxtaoufikakabli1
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 
1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-akyAmit Yadav
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeSumanthBT1
 
Proteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyProteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyAbhijeet Kashyap
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformaticscontactsoorya
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...Lars Juhl Jensen
 

Ähnlich wie Cncp 2010 (20)

Protein Qualitative Analysis Services
Protein Qualitative Analysis ServicesProtein Qualitative Analysis Services
Protein Qualitative Analysis Services
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
proteomics.ppt
proteomics.pptproteomics.ppt
proteomics.ppt
 
proteomics
 proteomics proteomics
proteomics
 
Short story PPT
Short story PPTShort story PPT
Short story PPT
 
SMB 28112013 Alain van Gool - Technologiecentra Radboudumc
SMB 28112013 Alain van Gool - Technologiecentra RadboudumcSMB 28112013 Alain van Gool - Technologiecentra Radboudumc
SMB 28112013 Alain van Gool - Technologiecentra Radboudumc
 
Methods for Protein Sequencing.pdf
Methods for Protein Sequencing.pdfMethods for Protein Sequencing.pdf
Methods for Protein Sequencing.pdf
 
Three Methods for Protein Sequencing
Three Methods for Protein SequencingThree Methods for Protein Sequencing
Three Methods for Protein Sequencing
 
2013-11-28 Science meets Business, Nijmegen
2013-11-28 Science meets Business, Nijmegen2013-11-28 Science meets Business, Nijmegen
2013-11-28 Science meets Business, Nijmegen
 
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
 
Presentationretrrgdgxxdbhhggvfcddxx.pptx
Presentationretrrgdgxxdbhhggvfcddxx.pptxPresentationretrrgdgxxdbhhggvfcddxx.pptx
Presentationretrrgdgxxdbhhggvfcddxx.pptx
 
Yasset perezriverol csi2011
Yasset perezriverol csi2011Yasset perezriverol csi2011
Yasset perezriverol csi2011
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programme
 
Proteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyProteomics and its applications in phytopathology
Proteomics and its applications in phytopathology
 
20140710 1 day1_nist_ercc2.0workshop
20140710 1 day1_nist_ercc2.0workshop20140710 1 day1_nist_ercc2.0workshop
20140710 1 day1_nist_ercc2.0workshop
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...
 

Cncp 2010

  • 1. CNCP 2010 Guangchuang Yu Jinan University 2010.11.19 Beijing 2010.11.10-11
  • 2. Overview • Fragmentation – 孙瑞祥 • Labeling Strategy – 陆豪杰 • De Novo Sequnceing – 董梦秋 马斌 王全会 张凯中 • Identification – 余维川 付岩 叶明亮 • Label free semi-quantitation – 邓宁 • Database Construction – 邵晨 杨芃原 • Data Quality Control – 朱云平
  • 3. Overview • Data Processing Platform – 关慎恒 盛泉虎 • Glycoproteomics – 杨芃原 应万涛 张凯中 • Proteogenomics – 谢鹭 赵屹 • Biological Problem oriented – 汪迎春 王通 徐平 • Protein Structure – 张法 卜东坡 • Others – 江瑞 张勇 张红雨
  • 4. Fragmentation 孙瑞祥 ICT Electron Transfer Dissociation: Characterization and Applications in Protein Identification 2010. Improved Peptide Identification for Proteomic Analysis Based on Comprehensive Characterization of Electron Transfer Dissociation Spectra. J Proteome Res. Important spectral characteristics of ETD are ignored or underutilized in popular database search algorithms, such as Mascot, Sequest, OMSSA, OR X! TANDEM Analyzed 461,440 spectra to find ETD characterization distinct hydrogen rearrangement patterns of +2, +3 and +4 precursors charge-reduced precursor ions and associated neutral loss peaks pFind identified 63-122% more unique peptides than Mascot for doubly charged precursors at 1% FDR cutoff.
  • 5. Labeling Strategy 陆豪杰 Fudan Uinv In vivo termini amino acid labeling for quantitative proteomics Cover 93% proteins deposited in Uniprot. More accuracy for identification and quantification. Dual digest by Arg-C & Lys-N (increase sample complexity)
  • 6. De Novo Sequencing 2010. pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra. Journal of Proteome Research 9:2713-2724. 董梦秋 NIBS De novo Sequencing of Peptides using HCD Spectra HCD produces high mass accuracy tandem mass spectra, the majority of which contain complete ion series. Besides, abundant internal and immonium ions in the HCD spectra can help differentiate between similar sequences. Ascaris suum sperm crawling related proteins pNovo Identify peptide sequences Blast Homologs of C. elegans Design primer for validation
  • 7. De Novo Sequencing 马斌 U of Waterloo Complete Homology-Assisted MS/MS Protein Sequencing (CHAMPS) 2009. Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy. Bioinformatics 25:2174 -2180. Novel protein SPIDER Homologous sequenceDe novo sequences CHAMPS Complete protein sequence (above 99% coverage and 100% accuracy for two standard proteins)
  • 8. De Novo Sequencing 王全会 BIG From an unknown genome to a measurable proteome: Studying on the pH-dependent proteomes in N10 bacteria by de novo sequencing 2009. Exploring membrane and cytoplasm proteomic responses of Alkalimonas amylolytica N10 to different external pHs with combination strategy of de novo peptide sequencing. Proteomics 9:1254-1273. Tandem spectra with/without SPITC labeling PEAKS for auto de novo Manually analyzed Combine filtered data Validation by PCR and Western blot More than 70% of the differential 2-DE spots were identified
  • 9. Identification 余维川 HKUST Optimization-Based Peptide Mass Fingerprinting for Protein Mixture Identification 2010. Optimization-based peptide mass fingerprinting for protein mixture identification. J. Comput. Biol 17:221- 235. • PMF method has two inherent disadvantages: – Originally designed for identifying single purified proteins rather than protein mixtures – Can’t distinguish different peptides with identical mass • Heuristic algorithm – Introduce a scoring function for protein mixture identification – Local search algorithms for protein mixture identification • External factors might be optimized to facilitate successful protein mixture identification – Mass accuracy – Sequence coverage – Noise level – Protein number in the mixtures
  • 10. Identification 付岩 ICT Unrestrictive modification detection based on related spectral pairs 2009. Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and retention time differences. BMC Bioinformatics 10 (Suppl 1):S50. • The majority of mass spectra cannot be interpreted at present – Unexpected or unknown protein PTM • Detect abundant PTM in high-accuracy peptide mass spectra – Efficient and sequence database-independent approach – Based on the observation that the spectra of a modified peptide and its unmodified counterpart are correlated with each other in their peptide masses and retention time – Frequently occurring peptide mass differences imply possible modifications – Small and consistent retention time differences provide orthogonal supporting evidence – Use a bivariate Gaussian mixture model to discriminate modification-related spectral pairs from random ones • Results – Experiments on two glycoprotein data sets demonstrate that the method can effectively detect abundant modifications and spectral pairs. – By including the discovered modifications into database search, an average of 10% more spectra are interpreted
  • 11. Identification 叶明亮 DICP Development of Methods and Platform for Data Processing in Mass Spectrometry Based Proteome Research PMID: 17761002/19551949/18314942/20568719/19522514/20334362 • Un-modified peptide identification – Implemented a predictive genetic algorithm for optimization of filtering criteria to maximize the number of identified peptides at fixed FDR for SEQUEST – Introduced an approach for calculating posterior probability of individual peptide identification from the “local FDR” by using k nearest neighbors algorithm and Shannon information entropy • Phosphopeptide identification – Developed an automatic validation approach for phosphopeptide identification by combining consecutive stage MS data and the target-decoy database searching strategy – Developed a classification filtering strategy to improve the phosphopeptide identification and phosphorylation site localization – Proposed a modified target-decoy database search strategy for confident phosphorylation site analysis of individual phosphoproteins without manual interpretation of spectra – Developed a software ArMone for processing and analysis of phosphoproteome data
  • 12. Label free semi-quantitation 邓宁 ZJU Quantitative Analysis of Mitochondrial Proteomes using Normalized Spectral Abundance Factor Samples:  5 human cardiac mitochondrial samples  8 murine cardiac mitochondrial samples  7 murine liver mitochondrial samples LC-MS/MS Database search by SEQUEST and statistically validated by Scaffold In-house software to generate NSAF value for quantitative analysis Results:  Electron transport chain show highest abundances , especially in heart  Metabolism related proteins and urea cycle proteins show more abundant in the liver
  • 13. Database Construction 邵晨 PUMC The urinary protein biomarker database • Data collection – Manual search in Pubmed – Review by Students • Database construction • Basic analysis – Compare different disease type – Simple descriptive statistical analysis – Construct disease-biomarker network and showing some basic topological properties
  • 14. Data Quality Control 朱云平 BPRC A nonparametric model for quality control of database search results in shotgun proteomics 2008. A nonparametric model for quality control of database search results in shotgun proteomics. BMC Bioinformatics 9:29. • Randomized database were used for quality control • Ignore to combine different database search scores to improve the sensitivity of randomized database methods • A multivariate nonlinear discriminate function (DF) based on the multivariate nonparametric density estimation technique was proposed to filter out false-positive database search results with a predictable FDR
  • 15. Data Processing Platform 关慎恒 UCSF A data processing platform for mammalian proteome dynamics studies using stable isotope metabolic labeling 2010. Analysis of proteome dynamics in the mouse brain. Proceedings of the National Academy of Sciences 107:14508 -14513. • Data processing platform – Integrate a variety of software modules into a workflow – Specifically developed for 15N metabolic labelling  Cross-extraction of 15N-containing ion intensities from raw data files of varying biosynthetic incorporation times  Computation of peptide 15N incorporation distributions  Aggregation of multiple peptide relative isotope abundance curves into a protein curve – Processing parameter optimization and noise reduction procedures are performed in some necessary processing modules to reduce the propagation errors in a long chain of the processing steps
  • 16. Data Processing Platform 盛泉虎 SIBS BuildSummary: A software tool for assembling protein • Maximize the number of confident proteins above a threshold of FDR – By integrate results from different peptide search engines for the same dataset • BuildSummary – Allow user to combine many independent PSM (peptide-spectrum matches) scoring algorithms including de novo sequencing and spectrum library search algorithms, if the same peptide FDR is applied to each of them by using target-decoy search approach
  • 17. Glycoproteomics Mass spectrometry database for glycoprotein structures 2009. Identification of N-Glycosylation Sites on Secreted Proteins of Human Hepatocellular Carcinoma Cells with a Complementary Proteomics Approach. Journal of Proteome Research 8:662-672. 杨芃原 Fudan Univ • Enrichment – Hydrophilic affinity enrichment – PNGase-F release of N-glycan • Results – Identified 4000 spectra of intact N-glycopeptides at FDR of 1% in three 2DLC runs for serum sample – 1500 different glycopeptides, corresponding to 250 glycosylation site, were discovered – Two separated high-confident databases for serum sample were constructed:  Naked glycopeptides (de-glycopeptides) database (523 peptides)  N-glycan database (599 glycans) – software GRIP were developed for interpretation of spectra from intact glycopeptides
  • 18. Glycoproteomics 应万涛 BPRC Establishment of a systematic method coupling consecutive MSn and software tools for charactering core-fucosylated glycoproteins 2009. A Strategy for Precise and Large Scale Identification of Core Fucosylated Glycoproteins. Molecular & Cellular Proteomics 8:913 -923. • Strategy development – Novel enrichment step  Combining the use of lectin for CF glycoprotein enrichment with ultrafiltration for further enrichment of glycopeptide – Established a neutral loss-dependent MS3 scan method that specifically captures partially deglycosylated CF glycopeptides – Established a novel database-independent candidate spectrum-filtering method for selecting partially deglycosylated CF glycopeptides and a spectrum optimization method
  • 19. Glycoproteomics 张凯中 UWO Glycan Structure Sequencing with Tandem Mass Spectrometry 2008. Complexities and algorithms for glycan sequencing using tandem mass spectrometry. J Bioinform Comput Biol 6:77-91. 2009. • Glycan de novo sequencing – Glycan database is rather incomplete – Determination of novel glycan structures requires de novo sequencing • Heuristic algorithm – First generates many acceptable small subtrees, which are then joined together in a repetitive process to obtain larger and larger suboptimal subtress until reaching the desired mass – At each size of the subtree, only limited number of subtrees are kept for later use – Experiments on real MS/MS data showed that the heuristic algorithm can be determine glycan structures • Contribution – A polynomial time algorithm is provided under a simple model of glycan de novo sequencing
  • 20. Proteogenomics 谢鹭 SIBS The discovery of novel protein-coding features in mouse genome based on mass spectrometry data • Detect un-annotated protein-coding regions in mouse genome – Two searchable proteomic database were constructed  All possible encoded exon junctions (EJCT dataset) for the discovery of novel exon splice events  Putative encoded exons (ORF database) for finding uninterrupted novel protein coding regions – Two datasets were combined with a public full-length protein dataset (competitive dataset) respectively and queried against 496 high-accuracy tandam MS RAW files from diverse mouse samples – 32 unique peptides (matching 149 spectra) from EJCT dataset were discovered which straddle novel exon junctions – 104 unique peptides (matching 450 spectra) from ORF dataset were located in 99 unique protein-coding regions
  • 21. Proteogenomics 赵屹 ICT Proteogenomics analysis of Thermoanaerobacter tengcongensis ( 腾冲嗜热菌 ) at different temperatures • Genome – Estimatd to encode 2588 theoretical proteins • Annotating Genome – By combining proteomics and transcriptomics  Transcriptomic data cover above 70% of 2588 genes  Above 74% of spectra were consistent with transcriptomic data – Quantitative analysis of gene expression levels at 4 different temperatures  359 genes were commonly expressed  Unique expressing genes were also detected in distinct temperatures – 80 genes not belong to 2588 gene set  2 coding regions were supported by MS  21 coding regions may encode novel non-coding RNA – The discovery was used to re-annotate 2588 gene set
  • 22. Biological Problem oriented 汪迎春 IGDB Deciphering the Signaling Network in the Leading Edge of the Migrating Cells 2007. Profiling signaling polarity in chemotactic cells. Proceedings of the National Academy of Sciences 104:8328 -8333. Characterization of the Ras/ERK Signaling Pathway in the PD by Combined Proteome and Phosphoproteome Profiling
  • 23. Biological Problem oriented 王通 JNU Pathway analysis-assisted study strategy in functional proteomics 2008. HIV-1 infected astrocytes and the microglial proteome. Journal of neuroimmune pharmacology 3:173-186. • Biological Questions – HIV associated neurodegenerative disorders (HAND) – HIV associated malignancy (HAM) – Infection and cancer
  • 24. Biological Problem oriented 徐平 BRPC Data analysis in large scale quantitative proteomics study with SILAC approach 2009. Quantitative Proteomics Reveals the Function of Unconventional Ubiquitin Chains in Proteasomal Degradation. Cell 137:133-145. • Background – K48-linked chains are mediators of proteasomal degradation – K6, K11, K27, K29 or K33 are not well understood • Results – Identified K11 linkage-specific substrates, including Ubc6, which involved in ERAD pathway (ER stress response)
  • 25. Protein Structure 张法 ICT Computational methods in cryo-electron microscopy: image data processing and 3D structure reconstruction 2009. A framework to refine particle clusters produced by EMAN. Bioinformatics 12:i276-i280. • EMAN – One of the most popular software packages for single particle reconstruction • Particle reclustering framework (PRF) – Normalization – Threshold determination – Reclustering
  • 26. Data Analysis 卜东坡 ICT Designing Succinct Structural Alphabets 2008. Designing succinct structural alphabets. Bioinformatics 24:i182 -i189. • Fragment libraries – A small amount of structural fragments can model protein structures accurately – The library size and accuracy are dominating factors for modeling and predicting the protein structures accurately – A major bottleneck for the fragment-based protein structure prediction methods is designing succinct and highly accurate structural alphabet • Contributions – Introducing structural information items, such as secondary structure, solvent accessibility and contact capacity, can improve the prediction of structural fragments – Derive the best combination of both sequence and structural information items, and significantly reduce the structural alphabet size, at the same level of accuracy by using integer linear programming – Significantly improve the protein structure prediction, with all other conditions unchanged • Scoring function for mapping a sequence segment to a structural fragment – Consists of mutation score, secondary structure score, contact capacity score, and environment fitness score. – Using more scoring items to improve the performance is promising
  • 27. Others 江瑞 Tsinghua DomainRBF: a Bayesian regression approach to the prioritization of associations between protein domains and human complex diseases 2010. Prioritisation of associations between protein domains and complex diseases using domain-domain interaction networks. Systems Biology, IET 4:212-222. • DomainRBF (domain Rank with Bayes Factor) – To prioritize association between candidate domains and human disease – Ranking score based on ‘guilt-by-association’ principle, which relies on the assumption that a disease is likely to be caused by a set of genes that have similar properties • Data sources – Domain-disease associations – Domain-domain interaction networks • Validation – Large-scale cross validation experiments on simulated linkage intervals, random controls and the whole genome – Results show that areas under ROC curves can be as high as 77.9%
  • 28. Others 张红雨 HZAU Proteins as molecular fossils 2010. A Universal Molecular Clock of Protein Folds and its Power in Tracing the Early History of Aerobic Metabolism and Planet Oxygenation. Molecular Biology and Evolution. • Proteins can also serve as molecular fossils • Building phylogenies and timelines of domains at fold and fold superfamily levels of structural complexity – Using a phylogenomic structural census in hundreds of proteomes – Correlate approximately linearly with geological timescales – Dissected the structures and functions of enzymes in simulated metabolic networks – The placement of anaerobic and aerobic enzymes in the timeline revealed that aerobic metabolism emerged ~2.9 billion years
  • 29. Others 张勇 BGI Shenzhen From NGS Genomics to MS-based Proteomics – BGI’s bioinformatics activities • Advertising from BGI Shenzhen – Introduce BGI’s developmental progress
  • 30. All slides will be available at http://cncp2010.ict.ac.cn/
  • 31. Phosphorylation Kevan Shokat UCSF Kinase-specific phosphorylation analysis 2008. Covalent capture of kinase-specific phosphopeptides reveals Cdk1-cyclin B substrates. Proceedings of the National Academy of Sciences 105:1442 -1447.
  • 32. Phosphorylation Kevan Shokat UCSF Kinase-specific phosphorylation analysis 2004. Design and use of analog-sensitive protein kinases. Curr Protoc Mol Biol Chapter 18:Unit 18.11. The amino acid that must be changed to construct –as kinase alleles can be most easily identified using a freely available online resource at http://kinase.ucsf.edu/ksd/.