SlideShare a Scribd company logo
1 of 49
Bio305 Bacterial Genome Annotation and Analysis Professor Mark Pallen
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General features of genomes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bacterial genome organisation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview of a genome project ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Whole-Genome Shotgun Sanger Sequencing Random shearing Size selection Cloning Sequence each insert  with two primers Pick colonies to create shotgun library bacterial  chromosome plasmid vector Plasmid preps
High-throughput Sequencing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
High-Throughput Shotgun Sequencing Random shearing Size selection bacterial  chromosome Add adapters Amplify Sequence
Illumina Sequencing
The Sequence Assembly Problem ,[object Object],[object Object],[object Object]
The Repeat Problem ,[object Object],ATTTATGTGT GTGTGGTGTG GTGTGGTGTG CACTACTGCT ACTACTGCTGACTACT GTGTGGTGTG GTGTGGTGTG ATATCCCT ATTTATGTGT GTGTGGTGTG GTGTGGTGTG CACTACTGCT ACTACTGCTGACTACT GTGTGGTGTG GTGTGGTGTG ATATCCCT Correct Incorrect
Paired-end Sequencing Random shearing Size selection for 3kb or 8kb etc bacterial  chromosome Add linkers Circularise Shear and select on size and presence of linkers Add adapters Obtain sequences from either side of linker known distance apart in genome ,[object Object],[object Object],[object Object],[object Object]
Genome Assembly Scaffold Contig 3 Contig 2 Contig 1 Physical Gap Sequence Gap
Re-sequencing ,[object Object],[object Object],[object Object]
SNP calling ,[object Object],[object Object],[object Object]
Genome annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to go from this….? ,[object Object],[object Object]
… to this? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Or this?
Caveat ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sources of information for annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Approaches to functional annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Base composition aids genome analysis GC skew (G-C)/G+C) Identifies origin of replication and leading lagging strands Genes coded by location & function %G+C Genes shared with E. coli Genes unique to S. typhi
Analysis of nucleotide sequence data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gene Finding in bacteria ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identifying protein-coding sequences ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The problem of conflicting ORFs Non-coding ORFs CDSs  (note ORF can extend upstream of start codon)
The Problem of Frameshift Errors Actual sequence 10  20  30  40  50  60  70 |  |  |  |  |  |  | ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA M  S  T  A  K  L  V  K  S  K  A  T  N  L  L  Y  T  R  N  D  V  S  D  S  E  K  •  V  P  L  N  •  L  N  Q  K  R  P  I  C  F  I  P  A  T  M  S  P  T  A  R  K  E  Y  R  •  I  S  •  I  K  S  D  Q  S  A  L  Y  P  Q  R  C  L  R  Q  R  E  K  10  20  30  40  50  60  70 |  |  |  |  |  |  | ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA M  S  T  A  K  L  V  K  S  K   S  D  Q  S  A  L  Y  P  Q  R  C  L  R  Q  R  E  •  V  P  L  N  •  L  N  Q  K  A  T  N  L  L  Y  T  R  N  D  V  S  D  S  E  K   E  Y  R  •  I  S  •  I  K  K  R  P  I  C  F  I  P  A  T  M  S  P  T  A  R  K  Frameshifted sequence after single base error
CDS Prediction: Graphical Plots GC content by reading frame Amino-acid composition by reading frame, compared to average for globular proteins
CDS Prediction: Markov Models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Annotation of protein-coding genes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Homology ,[object Object],[object Object],[object Object]
Homology ,[object Object],[object Object],[object Object],the cat  sat  on  the mat  die Katze sass auf der Matte vge|GBant88-2  ITLITCVSVKDNSKRYVVAG vge|GEfae9-178  LTLITCDQATKTTGRIIVIA vge|GSpne1-403  MTLITCDPIPTFNKRLLVNF sortase_staur  LTLITCDDYNEKTGVWEKRK
Types of Homology ,[object Object],[object Object],[object Object],[object Object]
Homology Searches ,[object Object],[object Object],[object Object],[object Object],[object Object]
What is BLAST? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The several flavours of BLAST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Chosing the right flavour ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Low complexity filtering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Understanding BLAST Results ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bit Scores high is good E-values low is good http://www.ncbi.nlm.nih.gov/BLAST/tutorial/
Typical Blast Output Sum Reading  High Probability Sequences producing High-scoring Segment Pairs:  Frame  Score  P(N)  N emb|X69337|ECDPS  E.coli dps gene for binding protein  +2  834  6.4e-109  1 gb|U04242|ECU04242  Escherichia coli core starvation p... +3  828  2.7e-106  1 emb|X14180|ECGLNHPQ  Escherichia coli glutamine permeas... +3  443  2.8e-53  1 gb|U18769|HDU18769  Haemophilus ducreyi fine tangled p... +1  150  4.0e-18  2 dbj|D01016|ANALTI46  Anabaena variabilis lti46 gene. >e... +2  129  4.8e-12  2 gb|M84990|P26BPO  Plasmid pOP2621 ORF1 gene, 5' end;... -2  131  6.7e-09  1 gb|U16121|HPU16121  Helicobacter pylori neutrophil act... +1  112  1.8e-06  1 gb|M32401|TRPTYF1  T.pallidum pallidum antigen TyF1 g... +3  101  5.6e-06  2 emb|X71436|RPNTRB  R.phaseoli ntrB gene +1  67  0.76  2 gb|L35598|DRODGC1A  Drosophila melanogaster receptor g... +1  48  0.97  3
Typical Blast Output gb|U18769|HDU18769  Haemophilus ducreyi fine tangled pili major pilin subunit gene Length = 780 Plus Strand HSPs: Score = 150 (68.0 bits), Expect = 4.0e-18, Sum P(2) = 4.0e-18 Identities = 36/89 (40%), Positives = 46/89 (51%), Frame = +1 Query:  30 ELLNRQVIQFIDLSLITKQAHWNMRGANFIAVHEMLDGFRTALIDHLDTMAERAVQLGGV 89 E L  ++  +L+LI K AHWN+ G  FIAVHEMLD  + D +D +AER  LG  Sbjct:  253 EALQMRLQGLNELALILKHAHWNVVGPQFIAVHEMLDSQVDEVRDFIDEIAERMATLGVA 432 Query:  90 ALGTTQVINSKTPLKSYPLDIHNVQDHLK 118 G +  +  YPL  QDHLK Sbjct:  433 PNGLSGNLVETRQSPEYPLGRATAQDHLK 519
Domain database searches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pfam domains
Pfam search results
The Annotation Catastrophe Signal Peptide A protease B Coiled coil domain C Homology lies in one domain Signal Peptide Protein A “ a protease” Protein B Protein C But functional assignment for whole of protein A comes from another domain, carried across in error, so proteins B and C get misannotated as proteases
Annotation: rules to consider ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
rehman2009
 

What's hot (20)

Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
Lectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-iLectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-i
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Vntr marker
Vntr markerVntr marker
Vntr marker
 
chloroplast genome ppt.
chloroplast genome ppt.chloroplast genome ppt.
chloroplast genome ppt.
 
dna sequencing methods
 dna sequencing methods dna sequencing methods
dna sequencing methods
 
Probe labelling
Probe labellingProbe labelling
Probe labelling
 
Basic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINEBasic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINE
 
DNA Library
DNA LibraryDNA Library
DNA Library
 
PPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOMEPPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOME
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysis
 
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
 
Cloning vectors
Cloning vectorsCloning vectors
Cloning vectors
 
Shahbaz Str
Shahbaz StrShahbaz Str
Shahbaz Str
 
Non-PCR-based Molecular Methods
Non-PCR-based Molecular MethodsNon-PCR-based Molecular Methods
Non-PCR-based Molecular Methods
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome ProjectRecombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
 

Viewers also liked

Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkoda
maths00001
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ?
Jeremaya
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland interviewees
katyfleury
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation
mhaimel
 
Luke emmateaurere power point
Luke emmateaurere power pointLuke emmateaurere power point
Luke emmateaurere power point
maths00001
 

Viewers also liked (20)

Bio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliBio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coli
 
Escherichia coli
Escherichia coliEscherichia coli
Escherichia coli
 
Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkoda
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ?
 
Postgresql 9.3-a4
Postgresql 9.3-a4Postgresql 9.3-a4
Postgresql 9.3-a4
 
Grantsmanship: A personal view
Grantsmanship: A personal viewGrantsmanship: A personal view
Grantsmanship: A personal view
 
Hum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaHum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafrica
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland interviewees
 
Ducky momo
Ducky momoDucky momo
Ducky momo
 
EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020
 
Bio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadBio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the dead
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation
 
Bio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsBio380 Cancer Phylogenomics
Bio380 Cancer Phylogenomics
 
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyBio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
 
EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming human
 
Luke emmateaurere power point
Luke emmateaurere power pointLuke emmateaurere power point
Luke emmateaurere power point
 
Bio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsBio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populations
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 

Similar to Bio305 genome analysis and annotation 2012

Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 
Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02
t7260678
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Integrated DNA Technologies
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
Dr. Olusoji Adewumi
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
c.titus.brown
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
drelamuruganvet
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
Dongyan Zhao
 

Similar to Bio305 genome analysis and annotation 2012 (20)

High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02
 
PCR
PCRPCR
PCR
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdf
 
Present status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPresent status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptx
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
 
Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptx
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencing
 
Genomic library
Genomic libraryGenomic library
Genomic library
 
Molecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariMolecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansari
 
Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891
 

More from Mark Pallen

Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics
Mark Pallen
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Mark Pallen
 

More from Mark Pallen (9)

Nothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionNothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of Evolution
 
Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
 
Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Bio305 pathogen biology_2012
Bio305 pathogen biology_2012
 
Bio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionBio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infection
 
Bio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsBio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging Infections
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest Relative
 
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaBio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
 
Bio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneBio380 2011 The Wandering Gene
Bio380 2011 The Wandering Gene
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Bio305 genome analysis and annotation 2012

  • 1. Bio305 Bacterial Genome Annotation and Analysis Professor Mark Pallen
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Whole-Genome Shotgun Sanger Sequencing Random shearing Size selection Cloning Sequence each insert with two primers Pick colonies to create shotgun library bacterial chromosome plasmid vector Plasmid preps
  • 7.
  • 8. High-Throughput Shotgun Sequencing Random shearing Size selection bacterial chromosome Add adapters Amplify Sequence
  • 10.
  • 11.
  • 12.
  • 13. Genome Assembly Scaffold Contig 3 Contig 2 Contig 1 Physical Gap Sequence Gap
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 20.
  • 21.
  • 22.
  • 23. Base composition aids genome analysis GC skew (G-C)/G+C) Identifies origin of replication and leading lagging strands Genes coded by location & function %G+C Genes shared with E. coli Genes unique to S. typhi
  • 24.
  • 25.
  • 26.
  • 27. The problem of conflicting ORFs Non-coding ORFs CDSs (note ORF can extend upstream of start codon)
  • 28. The Problem of Frameshift Errors Actual sequence 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA M S T A K L V K S K A T N L L Y T R N D V S D S E K • V P L N • L N Q K R P I C F I P A T M S P T A R K E Y R • I S • I K S D Q S A L Y P Q R C L R Q R E K 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA M S T A K L V K S K S D Q S A L Y P Q R C L R Q R E • V P L N • L N Q K A T N L L Y T R N D V S D S E K E Y R • I S • I K K R P I C F I P A T M S P T A R K Frameshifted sequence after single base error
  • 29. CDS Prediction: Graphical Plots GC content by reading frame Amino-acid composition by reading frame, compared to average for globular proteins
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. Bit Scores high is good E-values low is good http://www.ncbi.nlm.nih.gov/BLAST/tutorial/
  • 42. Typical Blast Output Sum Reading High Probability Sequences producing High-scoring Segment Pairs: Frame Score P(N) N emb|X69337|ECDPS E.coli dps gene for binding protein +2 834 6.4e-109 1 gb|U04242|ECU04242 Escherichia coli core starvation p... +3 828 2.7e-106 1 emb|X14180|ECGLNHPQ Escherichia coli glutamine permeas... +3 443 2.8e-53 1 gb|U18769|HDU18769 Haemophilus ducreyi fine tangled p... +1 150 4.0e-18 2 dbj|D01016|ANALTI46 Anabaena variabilis lti46 gene. >e... +2 129 4.8e-12 2 gb|M84990|P26BPO Plasmid pOP2621 ORF1 gene, 5' end;... -2 131 6.7e-09 1 gb|U16121|HPU16121 Helicobacter pylori neutrophil act... +1 112 1.8e-06 1 gb|M32401|TRPTYF1 T.pallidum pallidum antigen TyF1 g... +3 101 5.6e-06 2 emb|X71436|RPNTRB R.phaseoli ntrB gene +1 67 0.76 2 gb|L35598|DRODGC1A Drosophila melanogaster receptor g... +1 48 0.97 3
  • 43. Typical Blast Output gb|U18769|HDU18769 Haemophilus ducreyi fine tangled pili major pilin subunit gene Length = 780 Plus Strand HSPs: Score = 150 (68.0 bits), Expect = 4.0e-18, Sum P(2) = 4.0e-18 Identities = 36/89 (40%), Positives = 46/89 (51%), Frame = +1 Query: 30 ELLNRQVIQFIDLSLITKQAHWNMRGANFIAVHEMLDGFRTALIDHLDTMAERAVQLGGV 89 E L ++ +L+LI K AHWN+ G FIAVHEMLD + D +D +AER LG Sbjct: 253 EALQMRLQGLNELALILKHAHWNVVGPQFIAVHEMLDSQVDEVRDFIDEIAERMATLGVA 432 Query: 90 ALGTTQVINSKTPLKSYPLDIHNVQDHLK 118 G + + YPL QDHLK Sbjct: 433 PNGLSGNLVETRQSPEYPLGRATAQDHLK 519
  • 44.
  • 47. The Annotation Catastrophe Signal Peptide A protease B Coiled coil domain C Homology lies in one domain Signal Peptide Protein A “ a protease” Protein B Protein C But functional assignment for whole of protein A comes from another domain, carried across in error, so proteins B and C get misannotated as proteases
  • 48.
  • 49.

Editor's Notes

  1. 12
  2. 13