Protein sequencing and its application in bioinformatics

PROTEIN SEQUENCING AND
ITS APPLICATION IN
BIOINFORMATICS
BY,
ARINDAM CHAKRABORTY
M.PHARM, 2ND SEMESTER
PHARMACEUTICAL BIOTECHNOLOGY
CIPT AND AHS
CONTENTS
1. Introduction
2. History
3. Prepare the proteins for sequencing
4. Sequencing methods
5. N-terminal sequencing
6. C-terminal sequencing
7. DNA sequencing
8. Protein mass spectrometry
9. Bioinformatics tools
INTRODUCTION
1. Protein:
 Polymer of amino acids
 Protein structure and function depends upon amino acid sequence.
2. Protein Sequencing:
 Technique to find out amino acid sequences in protein.
 Important for understanding cellular functions.
 Important in targeting drugs to specific metabolic pathways
HISTORY
 1951: The very first sequence of insulin protein were characterized by Fred Sanger. The
method used in this study , which is called “SANGER METHOD” was a milestone in
sequencing long strand molecule such as DNA. This method was eventually used in
human genome project.
 1969: Analysis of sequence of tRNA were used to infer residues interactions from
corelated changes in nucleotide sequence, giving rise to tRNA secondary structure.
 1970: Saul B.Needleman and Christain D.Wunsh published the first computer algorithm
for aligning two sequences.
 1977: Publication of first complete genome of bacteriophage.
PREPARE THE PROTEINS FOR SEQUENCING
 If the protein contains more than one polypeptide chain, the chains are separated and purified.
 Intrachain S--S (disulfide) cross-bridges between cysteine residues in the polypeptide chain are
cleaved. If these disulfides are interchain linkages, then step 2 precedes step 1.
 The amino acid composition of each polypeptide chain is determined.
 The N-terminal and C-terminal residues are identified.
 Each polypeptide chain is cleaved into smaller fragments.
 Sequence determination of peptide fragments.
 The overall amino acid sequence of the protein is reconstructed from the sequences in
overlapping fragments.
 The positions of S--S cross-bridges formed between cysteine residues are located.
 Separation of Polypeptide Chains:
Subunit associations in multimeric proteins are typically maintained solely by
noncovalent forces, and therefore most multimeric proteins can usually be
dissociated by exposure to pH extremes, 8 M urea, 6 M guanidinium hydrochloride,
or high salt concentrations.
 Cleavage of Disulfide Bridges:
Oxidation of a disulfide by performic acid results in the formation of two
equivalents of cysteic acid.
SEQUENCING METHODS
 N-terminal sequencing
 C-terminal sequencing
 Prediction from DNA sequence
N-TERMINAL SEQUENCING
 The N-terminal sequencing is done through:
1. Edman’s degradation method
2. Sanger’s method
3. Dansyl chloride method
EDMAN ‘S DEGRADATION
METHOD
 Principle :
It sequentially remove one residue at a time from amino end of a peptide.
 Mechanism :
 Phenyl isothiocyanate is reacted with uncharged N-terminal amino group to form phenylthiocarbamoyl
derivative.
• Then under acidic conditions it is cleaved to form thiazolinone derivative.
• This thiazolinone derivative is extracted into organic solvent and treated with acid to form more stable
phenylthiohydantoin that can be identified using chromatography.
SANGER’S METHOD
• Treat with DNFB to form a derivative of amino terminal amino acid.
• Acid hydrolysis.
• Extraction of DNP-derivative with organic solvent.
• Identification of DNP-derivative by chromatography and comparison with
standards.
DANSYL CHLORIDE METHOD
• Reagent:1-dimethyl aminophthalene-5-sulfonyl chloride (dansyl chloride)
• Dansyl polypeptide chain is prepared.
• Acidic hydrolysis liberates all amino acid and N terminal dansyl amino acid.
• Amino acids are separated.
• Fluorescence of dansyl amino acid is detected.
• Types of amino acid is obtained from comparison with standard dansylated amino
acids.
C-TERMINAL SEQUENCING
 Add carboxypeptidases to a solution of protein.
 Take sample at regular intervals.
 Determine the terminal amino acid by analyzing a plot of amino
acid concentration against time.
DNA SEQUENCING
• Protein sequence can also be determined indirectly from mRNa
• Design primers from the amino acid sequene and amplify the gene.
• Sequence the gene and determine the amino acid sequence of proteins.
MASS SPECTROMETRY
 It is an important method for accurate mass determination and characterization of protein.
 Basic Principle: This technique basically studies the effect of ionizing energy on molecules . It depends upon
chemical reactions in the gas phase in which sample molecules are consumed during the formation of ionic
and neutral species.
 Components: The instrument consists of three major components:
1. Ion source: For producing gaseous ions from the substance being studied.
2. Analyzer: For resolving the ions into their characteristics mass components according to their mass to
charge ratio.
3. Detector system : For detecting the ions and recording the relative abundance of each of resolved ionic
species.
BIOINFORMATICS TOOLS
 Bioinformatics:
 The collection, classification ,storage and analysis of biochemical and biological
information using computers especially as applied to moleculer genetics and genomics.
 It is an interdisciplinary field that develops method and software tools for
understanding biological data.
 It combines biology, computer, science, information engineering, mathematics and
statistics to analyze and interpret biological data.
MASTER LAYOUT OF PROTEIN SEQUENCING
TYPES
 On the basis of number of comparing sequencing strand, it is of two types:
 Pairwise alignment
 Multiple alignment Types
PAIRWISE SEQUENCE ALIGNMENT
 Pairwise sequences alignment only compares two sequences at a time.
a b a c d
a b _ c d
 Optimality is based on SCORE.
 A pairwise alignment consist of series of paired bases, one base from each sequence.
 There are three types of pairs:
1. I. Matches: the same nucleotide appears in both sequence.
2. II. Mismatches: different nucleotides are found in two sequences.
3. III. Gaps: a base in one sequence and null base in the other.
 ALGORITHM used are Needleman-Wunsh algorithm and the Smith-Waterman algorithm.
 BLAST (Basic Local Alignment Search Tool)
 BLAST encompasses many different implementations and enhancements to a search algorithm that finds
“High Scoring Pairs” of sequence alignment in databases.
 It is a Fast way to find similar sequences.
 It is not the most sensitive way to search.
 It is by a wide margin the most commonly used tool in bioinformatics.
BLAST STEPS
 Seeding: Prepare a list of short, fixed length segments from the query.
 Searching: Find highly similar or exact match for each word.
 Extension: Extend each match to a longer match.
 Evaluation: Evaluation the results using E values.
MULTIPLE SEQUENCE ALIGNMENT
 Multiple Sequence Alignment can be seen as a generalization of Pairwise Sequence Alignment . Instead of
aligning just two sequences, three or more sequences are aligned simultaneously.
a b a c d
a b _ c d
x b a c e
 MSA is used for:
a. Detection of conserved domains in a group of genes or proteins.
b. Construction of a phylogenetic tree.
c. Prediction of protein structure.
d. Determination of consensus sequences.
CLUSTAL
 A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp(1988)
 CLUSTAL makes a global multiple alignment using a “progressive alignment”
approach.
 First computes all pairwise alignments and calculates sequence similarity between
pairs.
 These similarities are used to build a rough guide tree.
BASIC INFORMATION COMES
FROM SEQUENCE
 One sequence -can get some information eg-amino acid properties.
 More than one sequence- get more info on conserved residues , fold and
function.
 Multiple alignments of related sequence- can build up consensus
sequences of known families , domains , motifs or sites.
 Sequence alignments can give information on loops, families and
function from conserved regions.
APPLICATIONS OF PROTEIN
SEQUENCING
 Recombinant protein synthesis.
 Drugs production.
 Antibiotic production.
 Functional genomics.
 Determination of protein folding patterns.
 In bioinformatics.
 It plays vital role in proteomics.
 Used for the prediction of final structure, function and location of protein.
 To find out location of gene coding for that protein.
 Genetic diseases.
 Identification of sequence differences and variations such as point mutations.
 Revealing the evolution and genetic diversity of sequence and organisms.
THANK YOU
1 von 25

Más contenido relacionado

Was ist angesagt?

FastaFasta
FastaVenkatasubramanian P
52.3K views23 Folien
BLASTBLAST
BLASTAnushi Jain
22.3K views22 Folien

Was ist angesagt?(20)

Sequencing of protein pptsSequencing of protein ppts
Sequencing of protein ppts
Dr. d y patil acs college pimpri pune305 views
FastaFasta
Fasta
Venkatasubramanian P52.3K views
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
Vijay Hemmadi26.4K views
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
avrilcoghlan32.3K views
BLASTBLAST
BLAST
Anushi Jain22.3K views
Amino acid sequencingAmino acid sequencing
Amino acid sequencing
vaishalijain250331.1K views
Protein sequencingProtein sequencing
Protein sequencing
HafsaJamil1302 views
Protein micro arrayProtein micro array
Protein micro array
krupa sagar35.2K views
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
Nitin Naik 4.9K views
Sequencing of proteinSequencing of protein
Sequencing of protein
Arunima Sur81.2K views
Protein protein interactionProtein protein interaction
Protein protein interaction
Aashish Patel52.5K views
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
Mariya Raju80K views
Scoring matricesScoring matrices
Scoring matrices
Ashwini 63.4K views
protein sequence analysisprotein sequence analysis
protein sequence analysis
RamikaSingla8.8K views
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
N Poorin34.6K views
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
karamveer prajapat66.2K views
Protein  databaseProtein  database
Protein database
KAUSHAL SAHU1.1K views
Protein – DNA interactions, an overviewProtein – DNA interactions, an overview
Protein – DNA interactions, an overview
Dariyus Kabraji33.1K views
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Subhranil Bhattacharjee14.7K views

Similar a Protein sequencing and its application in bioinformatics(20)

Protein sequence determinatiomProtein sequence determinatiom
Protein sequence determinatiom
dravidjanardhan6.5K views
Genomics_Aishwarya Teli.pptxGenomics_Aishwarya Teli.pptx
Genomics_Aishwarya Teli.pptx
AishwaryaTeli511 views
Enzymology  namrataEnzymology  namrata
Enzymology namrata
Preetha Singha4K views
Homology modelingHomology modeling
Homology modeling
Malla Reddy College of Pharmacy42.6K views
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
Aashish Patel10.1K views
Lecture 8Lecture 8
Lecture 8
Prabesh Raj Jamkatel4.8K views
BEL110 presentationBEL110 presentation
BEL110 presentation
variable_orr872 views
Lecture 8.pptxLecture 8.pptx
Lecture 8.pptx
ericndunek1 view
31931 3194131931 31941
31931 31941
Amit Gupta128 views
HGP, the human genome projectHGP, the human genome project
HGP, the human genome project
Bahauddin Zakariya University lahore6.4K views
 proteomics proteomics
proteomics
vruddhi desai71.6K views
Sequence AnalysisSequence Analysis
Sequence Analysis
Meghaj Mallick710 views

Último(20)

Pediatric ConstipationPediatric Constipation
Pediatric Constipation
DrArjunPawar41 views
Scalp Cooling 101Scalp Cooling 101
Scalp Cooling 101
bkling36 views
Pediatric IntussusceptionPediatric Intussusception
Pediatric Intussusception
DrArjunPawar53 views
NMP-4.pptxNMP-4.pptx
NMP-4.pptx
Sai Sailesh Kumar Goothy32 views
ROSE CASE CARDIAC  ARRHYTHMIA SBRTROSE CASE CARDIAC  ARRHYTHMIA SBRT
ROSE CASE CARDIAC ARRHYTHMIA SBRT
Kanhu Charan31 views
Referral-system_April-2023.pdfReferral-system_April-2023.pdf
Referral-system_April-2023.pdf
manali905432 views
INDIAN SYSTEM OF MEDICINE, UNIT1, MPHARM PCG SEM2.pptxINDIAN SYSTEM OF MEDICINE, UNIT1, MPHARM PCG SEM2.pptx
INDIAN SYSTEM OF MEDICINE, UNIT1, MPHARM PCG SEM2.pptx
Prithivirajan Senthilkumar14 views
 Fastest Growing Pharmaceutical Companies in India Fastest Growing Pharmaceutical Companies in India
Fastest Growing Pharmaceutical Companies in India
Unimarck Pharma India Ltd.32 views
Depression PPT templateDepression PPT template
Depression PPT template
EmanMegahed618 views
Pregnancy tips.pptxPregnancy tips.pptx
Pregnancy tips.pptx
reachout732 views
POWDERS.pptxPOWDERS.pptx
POWDERS.pptx
SUJITHA MARY8 views
HEAT TRANSFER.pptxHEAT TRANSFER.pptx
HEAT TRANSFER.pptx
AneriPatwari163 views

Protein sequencing and its application in bioinformatics

  • 1. PROTEIN SEQUENCING AND ITS APPLICATION IN BIOINFORMATICS BY, ARINDAM CHAKRABORTY M.PHARM, 2ND SEMESTER PHARMACEUTICAL BIOTECHNOLOGY CIPT AND AHS
  • 2. CONTENTS 1. Introduction 2. History 3. Prepare the proteins for sequencing 4. Sequencing methods 5. N-terminal sequencing 6. C-terminal sequencing 7. DNA sequencing 8. Protein mass spectrometry 9. Bioinformatics tools
  • 3. INTRODUCTION 1. Protein:  Polymer of amino acids  Protein structure and function depends upon amino acid sequence. 2. Protein Sequencing:  Technique to find out amino acid sequences in protein.  Important for understanding cellular functions.  Important in targeting drugs to specific metabolic pathways
  • 4. HISTORY  1951: The very first sequence of insulin protein were characterized by Fred Sanger. The method used in this study , which is called “SANGER METHOD” was a milestone in sequencing long strand molecule such as DNA. This method was eventually used in human genome project.  1969: Analysis of sequence of tRNA were used to infer residues interactions from corelated changes in nucleotide sequence, giving rise to tRNA secondary structure.  1970: Saul B.Needleman and Christain D.Wunsh published the first computer algorithm for aligning two sequences.  1977: Publication of first complete genome of bacteriophage.
  • 5. PREPARE THE PROTEINS FOR SEQUENCING  If the protein contains more than one polypeptide chain, the chains are separated and purified.  Intrachain S--S (disulfide) cross-bridges between cysteine residues in the polypeptide chain are cleaved. If these disulfides are interchain linkages, then step 2 precedes step 1.  The amino acid composition of each polypeptide chain is determined.  The N-terminal and C-terminal residues are identified.  Each polypeptide chain is cleaved into smaller fragments.  Sequence determination of peptide fragments.  The overall amino acid sequence of the protein is reconstructed from the sequences in overlapping fragments.  The positions of S--S cross-bridges formed between cysteine residues are located.
  • 6.  Separation of Polypeptide Chains: Subunit associations in multimeric proteins are typically maintained solely by noncovalent forces, and therefore most multimeric proteins can usually be dissociated by exposure to pH extremes, 8 M urea, 6 M guanidinium hydrochloride, or high salt concentrations.  Cleavage of Disulfide Bridges: Oxidation of a disulfide by performic acid results in the formation of two equivalents of cysteic acid.
  • 7. SEQUENCING METHODS  N-terminal sequencing  C-terminal sequencing  Prediction from DNA sequence
  • 8. N-TERMINAL SEQUENCING  The N-terminal sequencing is done through: 1. Edman’s degradation method 2. Sanger’s method 3. Dansyl chloride method
  • 9. EDMAN ‘S DEGRADATION METHOD  Principle : It sequentially remove one residue at a time from amino end of a peptide.  Mechanism :  Phenyl isothiocyanate is reacted with uncharged N-terminal amino group to form phenylthiocarbamoyl derivative. • Then under acidic conditions it is cleaved to form thiazolinone derivative. • This thiazolinone derivative is extracted into organic solvent and treated with acid to form more stable phenylthiohydantoin that can be identified using chromatography.
  • 10. SANGER’S METHOD • Treat with DNFB to form a derivative of amino terminal amino acid. • Acid hydrolysis. • Extraction of DNP-derivative with organic solvent. • Identification of DNP-derivative by chromatography and comparison with standards.
  • 11. DANSYL CHLORIDE METHOD • Reagent:1-dimethyl aminophthalene-5-sulfonyl chloride (dansyl chloride) • Dansyl polypeptide chain is prepared. • Acidic hydrolysis liberates all amino acid and N terminal dansyl amino acid. • Amino acids are separated. • Fluorescence of dansyl amino acid is detected. • Types of amino acid is obtained from comparison with standard dansylated amino acids.
  • 12. C-TERMINAL SEQUENCING  Add carboxypeptidases to a solution of protein.  Take sample at regular intervals.  Determine the terminal amino acid by analyzing a plot of amino acid concentration against time.
  • 13. DNA SEQUENCING • Protein sequence can also be determined indirectly from mRNa • Design primers from the amino acid sequene and amplify the gene. • Sequence the gene and determine the amino acid sequence of proteins.
  • 14. MASS SPECTROMETRY  It is an important method for accurate mass determination and characterization of protein.  Basic Principle: This technique basically studies the effect of ionizing energy on molecules . It depends upon chemical reactions in the gas phase in which sample molecules are consumed during the formation of ionic and neutral species.  Components: The instrument consists of three major components: 1. Ion source: For producing gaseous ions from the substance being studied. 2. Analyzer: For resolving the ions into their characteristics mass components according to their mass to charge ratio. 3. Detector system : For detecting the ions and recording the relative abundance of each of resolved ionic species.
  • 15. BIOINFORMATICS TOOLS  Bioinformatics:  The collection, classification ,storage and analysis of biochemical and biological information using computers especially as applied to moleculer genetics and genomics.  It is an interdisciplinary field that develops method and software tools for understanding biological data.  It combines biology, computer, science, information engineering, mathematics and statistics to analyze and interpret biological data.
  • 16. MASTER LAYOUT OF PROTEIN SEQUENCING
  • 17. TYPES  On the basis of number of comparing sequencing strand, it is of two types:  Pairwise alignment  Multiple alignment Types
  • 18. PAIRWISE SEQUENCE ALIGNMENT  Pairwise sequences alignment only compares two sequences at a time. a b a c d a b _ c d  Optimality is based on SCORE.  A pairwise alignment consist of series of paired bases, one base from each sequence.  There are three types of pairs: 1. I. Matches: the same nucleotide appears in both sequence. 2. II. Mismatches: different nucleotides are found in two sequences. 3. III. Gaps: a base in one sequence and null base in the other.
  • 19.  ALGORITHM used are Needleman-Wunsh algorithm and the Smith-Waterman algorithm.  BLAST (Basic Local Alignment Search Tool)  BLAST encompasses many different implementations and enhancements to a search algorithm that finds “High Scoring Pairs” of sequence alignment in databases.  It is a Fast way to find similar sequences.  It is not the most sensitive way to search.  It is by a wide margin the most commonly used tool in bioinformatics.
  • 20. BLAST STEPS  Seeding: Prepare a list of short, fixed length segments from the query.  Searching: Find highly similar or exact match for each word.  Extension: Extend each match to a longer match.  Evaluation: Evaluation the results using E values.
  • 21. MULTIPLE SEQUENCE ALIGNMENT  Multiple Sequence Alignment can be seen as a generalization of Pairwise Sequence Alignment . Instead of aligning just two sequences, three or more sequences are aligned simultaneously. a b a c d a b _ c d x b a c e  MSA is used for: a. Detection of conserved domains in a group of genes or proteins. b. Construction of a phylogenetic tree. c. Prediction of protein structure. d. Determination of consensus sequences.
  • 22. CLUSTAL  A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp(1988)  CLUSTAL makes a global multiple alignment using a “progressive alignment” approach.  First computes all pairwise alignments and calculates sequence similarity between pairs.  These similarities are used to build a rough guide tree.
  • 23. BASIC INFORMATION COMES FROM SEQUENCE  One sequence -can get some information eg-amino acid properties.  More than one sequence- get more info on conserved residues , fold and function.  Multiple alignments of related sequence- can build up consensus sequences of known families , domains , motifs or sites.  Sequence alignments can give information on loops, families and function from conserved regions.
  • 24. APPLICATIONS OF PROTEIN SEQUENCING  Recombinant protein synthesis.  Drugs production.  Antibiotic production.  Functional genomics.  Determination of protein folding patterns.  In bioinformatics.  It plays vital role in proteomics.  Used for the prediction of final structure, function and location of protein.  To find out location of gene coding for that protein.  Genetic diseases.  Identification of sequence differences and variations such as point mutations.  Revealing the evolution and genetic diversity of sequence and organisms.