SlideShare ist ein Scribd-Unternehmen logo
1 von 33
COMPARATIVE GENOMICS
Presentation by,
ATHIRA RG
BBM051603
M.Sc. Biochemistry & Molecular Biology
 Comparative genomics involves a comprehensive
systematic comparison of genome sequences.
 It begins with powerful computer programs that
identify homologous regions within the genomes
under comparison.
 Sets of homologous sequences are then grouped
with their sequences aligned at the base-pair level
in an attempt to define whole genome sequence
alignments.
• Discover what lies hidden in genomic sequences
by comparing sequence information.
By comparing the human genome with the genomes of
different organisms, researchers can better understand the
structure and function of human genes and thereby develop
new strategies in the battle against human disease.
 In addition, comparative genomics provides a powerful
new tool for studying evolutionary changes among
organisms, helping to identify the genes that are conserved
among species along with the genes that give each
organism its own unique characteristics.
SOME QUESTIONS THAT COMPARATIVE GENOMICS CAN
ADDRESS?
How has the organism evolved?
What differentiates species?
Which non-coding regions are important?
Which genes are required for organisms to survive in a
certain environment?
PHYLOGENETIC DISTANCE
 Information that can be gained by
comparison of genomes largely dependent
upon the phylogenetic distances between
them.
 Phylogenetic distance is a measure of the
degree of separation b/w two organisms or
genomes on an evolutionary scale , usually
expressed as the number of accumulated
sequence changes, number of years or
number of generations
 More distance, less sequence similarity or
less shared genomic features.
Comparisons of Genomes at Different Phylogenetic Distances Are
Appropriate to Address Different Questions
 Broad insights about types of genes can be gleaned by genomic comparisons at very long
phylogenetic distances, e.g., greater than 1 billion years since their separation.
 For example, comparing the genomes of yeast, worms, and flies reveals that these
eukaryotes encode many of the same proteins, and the non-redundant protein sets of flies
and worms are about the same size, being only twice that of yeast.
 The more complex developmental biology of flies and worms is reflected in the greater
number of signaling pathways in these two species than in yeast.
 Over such very large distances, the order of genes and the sequences regulating their
expression are generally not conserved.
 At moderate phylogenetic distances (roughly 70–100 million years of divergence), both
functional and nonfunctional DNA is found within the conserved DNA.
 In these cases, the functional sequences will show a signature of purifying or negative
selection, which is that the functional sequences will have changed less than the
nonfunctional or neutral DNA (Jukes and Kimura 1984).
COMMONLY USED TOOLS
 UCSC Browser: This site contains the reference sequence and working draft
assemblies for a large collection of genomes.
 Ensembl: The Ensembl project produces genome databases for vertebrates and
other eukaryotic species, and makes this information freely available online.
 MapView: The Map Viewer provides a wide variety of genome mapping and
sequencing data.
 VISTA is a comprehensive suite of programs and databases for comparative
analysis of genomic sequences. It was built to visualize the results of
comparative analysis based on DNA alignments. The presentation of
comparative data generated by VISTA can easily suit both small and large scale
of data.
 BlueJay Genome Browser: a stand-alone visualization tool for the multi-scale
viewing of annotated genomes and other genomic elements.
 Chromosome level
Number of genes
Genome size
Content (sequence)
Location (map position)
Gene Order
Gene Cluster (Genes that are part of a known metabolic pathway, are found
to exist as a group)
Translocation: movement of genomic part fromone position to another
HOW ARE GENOMES COMPARED ?
Different ways of comparison
Whole genome
Genome alignments
Synteny (gene
order conservation)
Anomalous regions
Gene-centric
Gene families
and unique genes
Gene clustering by
function
Gene sequence variations
Codon usage,
SNPs,
inDels,
pseudogenes
GENOME ALIGNMENT
 Alignment of DNA sequences is the core process in
comparative genomics.
 An alignment is a mapping of the nucleotides in one
sequence onto the nucleotides in the other sequence,
with gaps introduced into one or the other sequence to
increase the number of positions with matching
nucleotides.
 Several powerful alignment algorithms have been
developed to align two or more sequences.
Popular alignment programs such as BLAST and FASTA or the multiple alignment program Clustal
W are essentiallyoptimizedfor the alignment
Computational tools for genome-scale sequence alignment
 Human PKLR gene region compared
to the macaque, dog, mouse, chicken,
and zebrafish genomes
Numbers on the vertical axis represent the
proportion of identical nucleotides in a 100-
bp window for a point on the plot. Numbers
on the horizontal axis indicate the nucleotide
position from the beginning of the 12-
kilobase human genomic sequence. Peaks
shaded in blue correspond to the PKLR coding
regions. Peaks shaded in light blue correspond
to PKLR mRNA untranslated regions. Peaks
shaded in red correspond to conserved non-
coding regions (CNSs), defined as areas where
the average identity is > 75%. Alignment was
generated using the sequence comparison tool
VISTA (http://pipeline.lbl.gov).
GENOME
ALIGNMENT
 Notice the high degree of sequence similarity between human and macaque
(two primates) in both PKLR exons (blue) as well as introns (red) and
untranslated regions (light blue) of the gene.
 In contrast, the chicken and zebrafish alignments with human only show
similarity to sequences in the coding exons; the rest of the sequence has
diverged to a point where it can no longer be reliably aligned with the human
DNA sequence.
 Using such computer-based analysis to zero in on the genomic features that
have been preserved in multiple organisms over millions of years, researchers
are able to locate the signals that represent the location of genes, as well as
sequences that may regulate gene expression.
 Indeed, much of the functional parts of the human genome have been
discovered or verified by this type of sequence comparison (Lander et al. 2001)
and it is now a standard component of the analysis of every new genome
sequence.
Comparison of overall nucleotide statistics
• Overall nucleotide statistics, suchas
– Genome size,
– Overall (G+C) content,
– Regions of different (G+C) content,
– Genome signature such as codon usage biases,
– Amino acid usage biases, and the ratio of observed dinucleotide frequency
These all present a global view of the similarities and differences of the genomes
SYNTENY
 Refers to regions of two genomes that show considerable similarity in terms of
 sequence and
 conservation of the order of genes
likelyto be related by common descent.
By mapping of syntenic regions in corresponding genomes, genome rearrangement
events can be identifiedsuchas fission, translocation, inversion, and transposition
SYNTENY
Once syntenic regions are detected, one can obtain breakpoints(a.k.a. syntenicboundaries)
betweensyntenicregions.
Analysis of various genomicfeatures of the breakpoints such as G+C content, gene density,
and the density of various DNA repeats provides understanding of the evolution of
genomes.
For instance, Mural et al. observedsharpdiscontinuity of features aroundsome syntenic
boundaries but not others.
They hypothesizedthat syntenicboundaries that do not show sharp transitions in these
various features may provide evidence for conservation of the ancestral pattern in the
lineage.
Analysis Of Breakpoints
Homologs:
Genes that have the same ancestor; in general retain the same function
Orthologs:
Homologs from different species (arise from speciation)
Paralogs:
Homologs from the same species (arise from duplication)
 Duplication before speciation (ancient duplication) : Out-paralogs; may not
have the same function
 Duplication after speciation (recent duplication) : In-paralogs; likely to have
the same function
GENE CENTRIC COMPARISON
GENE CLUSTERS
 In prokaryotes, groups of functionally related genes tend to be
located in close proximity to each other, and often in specific order,
as exemplified by operons.
 Although gene order conservation beyond the level of operons is
much less prevalent, conservation of clusters and gene order can be
important indicators of function.
 Several approaches have been used to determine functionally
related ‘‘clusters’’ of genes.
 Overbeek et al. use the constructs of a ‘‘pair of close bidirectional
best hits’’ (PCBBH) and ‘‘pairs of close homologs’’ (PCHs) to
represent pairs of genes that are closely conserved between two
species and likely to be functionally related.
COGs
Cluster of orthologous genes.
 groups of threeor more orthologgenes,
 meaningtheyare direct evolutionarycounter parts and are considered to be part of an 'ancient conserved domain'.
 A COGis definedas threeor more proteins fromthe genomes of distant species that are more similarto each other than
to anyotherproteinwithin the individual genome.
 COGs can be used to predict the function of homologousproteins in poorly studied species and can alsobe used to track
the evolutionarydivergence froma common ancestor,
 hence providinga powerful toolfor functional annotation of uncharacterizedproteins.
 Important in comparative genomics studies
Application of COG
 The most straightforwardapplication of the COGs is for the predictionof functions of individual
proteins or proteinsets, including those fromnewly completedgenomes.
COG database
NCBI provides a COG databasethat consists of 4,873 COGs that code for over 13600
proteins fromthe genomes of 50 bacteria, 13 archaea and 3 unicellular eukaryotes. This
database uses completely sequenced genomes to classify proteins using the orthologyconcept.
MBGD
 MBGDis a database for comparative analysis
Of completely sequenced microbial genomes,
the number of which is now growing rapidly.
 The aimof MBGDis to facilitatecomparative
genomics fromvarious points of viewsuchas
ortholog identification, paralog clustering,
motif analysis and gene order comparisons
COMPARATIVE ANALYSIS OF CODING
REGIONS
 typically involves the identification of gene-coding regions,
comparison of gene content, and comparison of protein content.
 Recently there have also been a number of algorithms developed that
use comparative genomics to aid function prediction of genes.
The analysis and comparison of the coding regions starts with, and is
very dependent upon, the gene identification algorithm that is used to
infer what portions of the genomic sequence actively code for genes.
A combination of multiple gene identification approaches are often used together in large-scale analysis to
improve the overall accuracy
COMPARATIVE ANALYSIS OF NON CODING
REGIONS
 Noncoding regions of the genome, which may comprise as much as 97%
of the genome length such as in the human genome, gained a lot of
attention in recent years because of its predicted role in regulation of
transcription, DNA replication, and other biological functions .
 However, identification of regulatory elements from the noncoding
portion of a genome remains a challenge.
 Comparative genomics has been used to greatly aid the identification of
regulatory segments by comparing the genomic noncoding DNA
sequences from diverse species to identify conserved regions .
 This approach is based on the presumption that selective pressure
causes regulatory elements to evolve at a slower rate than that of non
regulatory sequences in the noncoding regions.
ANALYSIS OF MUTATIONS
 Search and display of mutations within multiple alignments, with
discrimination between intergenic, synonymous, non-synonymous
and Indel mutations.
 Additional filtering based on SNP quality scores.
 Display colors based on mutation type or quality; sorting based on
position, gene, NA change, AA change, quality
 Direct clustering based upon mutations or export of mutation list
for further analysis.
 Nonfunctional protein coding genes
 Mutations introduce “sequence problems” (frameshifts, stop in frame, absence of stop)
PSEUDOGENES?
 “Normal” bacterial genomes have 1-5% of pseudogenes [Liu et al]
 Pseudogenes can give interesting clues to evolutionary pathways
 High fractions of pseudogenes suggest a “genome degradation” process
 May be cause or effect of niche restriction
 Examples
 Mycobacterium leprae: 36% (~1,100 genes)
 Leifsonia xyli subsp. xyli: 13% (~300 genes)
 Pseudogenes do not show up in BLAST searches
APPLICATIONS
Gene identification
 comparative genomics can aid gene identification. Comparative genomics can recognize real
genes based on their patterns of nucleotide conservation across evolutionary time. With the
availability of genome-wide alignments across the genomes compared, the different ways by
which sequences change in known genes and in intergenic regions can be analyzed. The
alignments of known genes will reveal the conservation of the reading frame of protein
translation.
Regulatory motif discovery
 Regulatory motifs are short DNA sequences about 6 to 15bp long that are used to control the
expression of genes, dictating the conditions under which a gene will be turned on or off. Each
motif is typically recognized by a specific DNA-binding protein called a transcription factor (TF).
A transcription factor binds precise sites in the promoter region of target genes in a sequence-
specific way, but this contact can tolerate some degree of sequence variation. Comparative
genomics provides a powerful way to distinguish regulatory motifs from non-functional patterns
based on their conservation.
APPLICATIONS
 Comparative genomics has wide applications in the field of molecular
medicine and molecular evolution. The most significant application of
comparative genomics in molecular medicine is the identification of drug
targets of many infectious diseases. For example, comparative analyses of
fungal genomes have led to the identification of many putative targets for
novel antifungal. This discovery can aid in target based drug design to cure
fungal diseases in human.
 Comparative genomics also helps in the clustering of regulatory sites , which
can help in the recognition of unknown regulatory regions in other genomes.
The metabolic pathway regulation can also be recognized by means of
comparative genomics of a species.
 Agriculture is a field that reaps the benefits of comparative genomics.
Identifying the loci of advantageous genes is a key step in breeding crops
that are optimized for greater yield, cost-efficiency, quality, and disease
resistance.
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Est database
Est databaseEst database
Est database
 
AFLP, RFLP & RAPD
AFLP, RFLP & RAPDAFLP, RFLP & RAPD
AFLP, RFLP & RAPD
 
Labelling of dna
Labelling of dnaLabelling of dna
Labelling of dna
 
YEAST TWO HYBRID SYSTEM
 YEAST TWO HYBRID SYSTEM YEAST TWO HYBRID SYSTEM
YEAST TWO HYBRID SYSTEM
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Antisense rna technology
Antisense rna technologyAntisense rna technology
Antisense rna technology
 
Molecular probes
Molecular probesMolecular probes
Molecular probes
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
bacterial artificial chromosome & yeast artificial chromosome
bacterial artificial chromosome & yeast artificial chromosomebacterial artificial chromosome & yeast artificial chromosome
bacterial artificial chromosome & yeast artificial chromosome
 
Comparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organellesComparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organelles
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
PHYSICAL MAPPING STRATEGIES IN GENOMICS
PHYSICAL MAPPING STRATEGIES IN GENOMICSPHYSICAL MAPPING STRATEGIES IN GENOMICS
PHYSICAL MAPPING STRATEGIES IN GENOMICS
 
Expression system final
Expression system finalExpression system final
Expression system final
 

Ähnlich wie Comparative genomics

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsprateek kumar
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 pptrehman2009
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2GCUF
 
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...DevikaPatel12
 
Comparative genomics.pdf
Comparative genomics.pdfComparative genomics.pdf
Comparative genomics.pdfshinycthomas
 
genomics and system biology
genomics and system biologygenomics and system biology
genomics and system biologyNawfal Aldujaily
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
genemappingppt-170209023430.pptx
genemappingppt-170209023430.pptxgenemappingppt-170209023430.pptx
genemappingppt-170209023430.pptxHINDUJA20
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Jonathan Eisen
 
Mapping the bacteriophage genome
Mapping the bacteriophage genomeMapping the bacteriophage genome
Mapping the bacteriophage genomevibhakhanna1
 
Human Genome 2009
Human Genome 2009Human Genome 2009
Human Genome 2009lyonja
 

Ähnlich wie Comparative genomics (20)

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
genomic comparison
genomic comparison genomic comparison
genomic comparison
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2
 
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
 
Gene mapping ppt
Gene mapping pptGene mapping ppt
Gene mapping ppt
 
Comparative genomics.pdf
Comparative genomics.pdfComparative genomics.pdf
Comparative genomics.pdf
 
genomics and system biology
genomics and system biologygenomics and system biology
genomics and system biology
 
Genomic variation
Genomic variationGenomic variation
Genomic variation
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
genemappingppt-170209023430.pptx
genemappingppt-170209023430.pptxgenemappingppt-170209023430.pptx
genemappingppt-170209023430.pptx
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
 
Molecular markers
Molecular markersMolecular markers
Molecular markers
 
Mapping the bacteriophage genome
Mapping the bacteriophage genomeMapping the bacteriophage genome
Mapping the bacteriophage genome
 
Nature Of Gene.pdf
Nature Of Gene.pdfNature Of Gene.pdf
Nature Of Gene.pdf
 
Nature Of Gene.pdf
Nature Of Gene.pdfNature Of Gene.pdf
Nature Of Gene.pdf
 
Human Genome 2009
Human Genome 2009Human Genome 2009
Human Genome 2009
 

Kürzlich hochgeladen

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 

Kürzlich hochgeladen (20)

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 

Comparative genomics

  • 1. COMPARATIVE GENOMICS Presentation by, ATHIRA RG BBM051603 M.Sc. Biochemistry & Molecular Biology
  • 2.  Comparative genomics involves a comprehensive systematic comparison of genome sequences.  It begins with powerful computer programs that identify homologous regions within the genomes under comparison.  Sets of homologous sequences are then grouped with their sequences aligned at the base-pair level in an attempt to define whole genome sequence alignments. • Discover what lies hidden in genomic sequences by comparing sequence information.
  • 3. By comparing the human genome with the genomes of different organisms, researchers can better understand the structure and function of human genes and thereby develop new strategies in the battle against human disease.  In addition, comparative genomics provides a powerful new tool for studying evolutionary changes among organisms, helping to identify the genes that are conserved among species along with the genes that give each organism its own unique characteristics.
  • 4. SOME QUESTIONS THAT COMPARATIVE GENOMICS CAN ADDRESS? How has the organism evolved? What differentiates species? Which non-coding regions are important? Which genes are required for organisms to survive in a certain environment?
  • 5. PHYLOGENETIC DISTANCE  Information that can be gained by comparison of genomes largely dependent upon the phylogenetic distances between them.  Phylogenetic distance is a measure of the degree of separation b/w two organisms or genomes on an evolutionary scale , usually expressed as the number of accumulated sequence changes, number of years or number of generations  More distance, less sequence similarity or less shared genomic features.
  • 6. Comparisons of Genomes at Different Phylogenetic Distances Are Appropriate to Address Different Questions
  • 7.  Broad insights about types of genes can be gleaned by genomic comparisons at very long phylogenetic distances, e.g., greater than 1 billion years since their separation.  For example, comparing the genomes of yeast, worms, and flies reveals that these eukaryotes encode many of the same proteins, and the non-redundant protein sets of flies and worms are about the same size, being only twice that of yeast.  The more complex developmental biology of flies and worms is reflected in the greater number of signaling pathways in these two species than in yeast.  Over such very large distances, the order of genes and the sequences regulating their expression are generally not conserved.  At moderate phylogenetic distances (roughly 70–100 million years of divergence), both functional and nonfunctional DNA is found within the conserved DNA.  In these cases, the functional sequences will show a signature of purifying or negative selection, which is that the functional sequences will have changed less than the nonfunctional or neutral DNA (Jukes and Kimura 1984).
  • 8. COMMONLY USED TOOLS  UCSC Browser: This site contains the reference sequence and working draft assemblies for a large collection of genomes.  Ensembl: The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.  MapView: The Map Viewer provides a wide variety of genome mapping and sequencing data.  VISTA is a comprehensive suite of programs and databases for comparative analysis of genomic sequences. It was built to visualize the results of comparative analysis based on DNA alignments. The presentation of comparative data generated by VISTA can easily suit both small and large scale of data.  BlueJay Genome Browser: a stand-alone visualization tool for the multi-scale viewing of annotated genomes and other genomic elements.
  • 9.
  • 10.  Chromosome level Number of genes Genome size Content (sequence) Location (map position) Gene Order Gene Cluster (Genes that are part of a known metabolic pathway, are found to exist as a group) Translocation: movement of genomic part fromone position to another HOW ARE GENOMES COMPARED ?
  • 11.
  • 12. Different ways of comparison Whole genome Genome alignments Synteny (gene order conservation) Anomalous regions Gene-centric Gene families and unique genes Gene clustering by function Gene sequence variations Codon usage, SNPs, inDels, pseudogenes
  • 13. GENOME ALIGNMENT  Alignment of DNA sequences is the core process in comparative genomics.  An alignment is a mapping of the nucleotides in one sequence onto the nucleotides in the other sequence, with gaps introduced into one or the other sequence to increase the number of positions with matching nucleotides.  Several powerful alignment algorithms have been developed to align two or more sequences. Popular alignment programs such as BLAST and FASTA or the multiple alignment program Clustal W are essentiallyoptimizedfor the alignment
  • 14. Computational tools for genome-scale sequence alignment
  • 15.  Human PKLR gene region compared to the macaque, dog, mouse, chicken, and zebrafish genomes Numbers on the vertical axis represent the proportion of identical nucleotides in a 100- bp window for a point on the plot. Numbers on the horizontal axis indicate the nucleotide position from the beginning of the 12- kilobase human genomic sequence. Peaks shaded in blue correspond to the PKLR coding regions. Peaks shaded in light blue correspond to PKLR mRNA untranslated regions. Peaks shaded in red correspond to conserved non- coding regions (CNSs), defined as areas where the average identity is > 75%. Alignment was generated using the sequence comparison tool VISTA (http://pipeline.lbl.gov). GENOME ALIGNMENT
  • 16.  Notice the high degree of sequence similarity between human and macaque (two primates) in both PKLR exons (blue) as well as introns (red) and untranslated regions (light blue) of the gene.  In contrast, the chicken and zebrafish alignments with human only show similarity to sequences in the coding exons; the rest of the sequence has diverged to a point where it can no longer be reliably aligned with the human DNA sequence.  Using such computer-based analysis to zero in on the genomic features that have been preserved in multiple organisms over millions of years, researchers are able to locate the signals that represent the location of genes, as well as sequences that may regulate gene expression.  Indeed, much of the functional parts of the human genome have been discovered or verified by this type of sequence comparison (Lander et al. 2001) and it is now a standard component of the analysis of every new genome sequence.
  • 17. Comparison of overall nucleotide statistics • Overall nucleotide statistics, suchas – Genome size, – Overall (G+C) content, – Regions of different (G+C) content, – Genome signature such as codon usage biases, – Amino acid usage biases, and the ratio of observed dinucleotide frequency These all present a global view of the similarities and differences of the genomes
  • 18. SYNTENY  Refers to regions of two genomes that show considerable similarity in terms of  sequence and  conservation of the order of genes likelyto be related by common descent. By mapping of syntenic regions in corresponding genomes, genome rearrangement events can be identifiedsuchas fission, translocation, inversion, and transposition
  • 20. Once syntenic regions are detected, one can obtain breakpoints(a.k.a. syntenicboundaries) betweensyntenicregions. Analysis of various genomicfeatures of the breakpoints such as G+C content, gene density, and the density of various DNA repeats provides understanding of the evolution of genomes. For instance, Mural et al. observedsharpdiscontinuity of features aroundsome syntenic boundaries but not others. They hypothesizedthat syntenicboundaries that do not show sharp transitions in these various features may provide evidence for conservation of the ancestral pattern in the lineage. Analysis Of Breakpoints
  • 21. Homologs: Genes that have the same ancestor; in general retain the same function Orthologs: Homologs from different species (arise from speciation) Paralogs: Homologs from the same species (arise from duplication)  Duplication before speciation (ancient duplication) : Out-paralogs; may not have the same function  Duplication after speciation (recent duplication) : In-paralogs; likely to have the same function GENE CENTRIC COMPARISON
  • 22. GENE CLUSTERS  In prokaryotes, groups of functionally related genes tend to be located in close proximity to each other, and often in specific order, as exemplified by operons.  Although gene order conservation beyond the level of operons is much less prevalent, conservation of clusters and gene order can be important indicators of function.  Several approaches have been used to determine functionally related ‘‘clusters’’ of genes.  Overbeek et al. use the constructs of a ‘‘pair of close bidirectional best hits’’ (PCBBH) and ‘‘pairs of close homologs’’ (PCHs) to represent pairs of genes that are closely conserved between two species and likely to be functionally related.
  • 23. COGs Cluster of orthologous genes.  groups of threeor more orthologgenes,  meaningtheyare direct evolutionarycounter parts and are considered to be part of an 'ancient conserved domain'.  A COGis definedas threeor more proteins fromthe genomes of distant species that are more similarto each other than to anyotherproteinwithin the individual genome.  COGs can be used to predict the function of homologousproteins in poorly studied species and can alsobe used to track the evolutionarydivergence froma common ancestor,  hence providinga powerful toolfor functional annotation of uncharacterizedproteins.  Important in comparative genomics studies
  • 24. Application of COG  The most straightforwardapplication of the COGs is for the predictionof functions of individual proteins or proteinsets, including those fromnewly completedgenomes. COG database NCBI provides a COG databasethat consists of 4,873 COGs that code for over 13600 proteins fromthe genomes of 50 bacteria, 13 archaea and 3 unicellular eukaryotes. This database uses completely sequenced genomes to classify proteins using the orthologyconcept.
  • 25. MBGD  MBGDis a database for comparative analysis Of completely sequenced microbial genomes, the number of which is now growing rapidly.  The aimof MBGDis to facilitatecomparative genomics fromvarious points of viewsuchas ortholog identification, paralog clustering, motif analysis and gene order comparisons
  • 26. COMPARATIVE ANALYSIS OF CODING REGIONS  typically involves the identification of gene-coding regions, comparison of gene content, and comparison of protein content.  Recently there have also been a number of algorithms developed that use comparative genomics to aid function prediction of genes. The analysis and comparison of the coding regions starts with, and is very dependent upon, the gene identification algorithm that is used to infer what portions of the genomic sequence actively code for genes.
  • 27. A combination of multiple gene identification approaches are often used together in large-scale analysis to improve the overall accuracy
  • 28. COMPARATIVE ANALYSIS OF NON CODING REGIONS  Noncoding regions of the genome, which may comprise as much as 97% of the genome length such as in the human genome, gained a lot of attention in recent years because of its predicted role in regulation of transcription, DNA replication, and other biological functions .  However, identification of regulatory elements from the noncoding portion of a genome remains a challenge.  Comparative genomics has been used to greatly aid the identification of regulatory segments by comparing the genomic noncoding DNA sequences from diverse species to identify conserved regions .  This approach is based on the presumption that selective pressure causes regulatory elements to evolve at a slower rate than that of non regulatory sequences in the noncoding regions.
  • 29. ANALYSIS OF MUTATIONS  Search and display of mutations within multiple alignments, with discrimination between intergenic, synonymous, non-synonymous and Indel mutations.  Additional filtering based on SNP quality scores.  Display colors based on mutation type or quality; sorting based on position, gene, NA change, AA change, quality  Direct clustering based upon mutations or export of mutation list for further analysis.
  • 30.  Nonfunctional protein coding genes  Mutations introduce “sequence problems” (frameshifts, stop in frame, absence of stop) PSEUDOGENES?  “Normal” bacterial genomes have 1-5% of pseudogenes [Liu et al]  Pseudogenes can give interesting clues to evolutionary pathways  High fractions of pseudogenes suggest a “genome degradation” process  May be cause or effect of niche restriction  Examples  Mycobacterium leprae: 36% (~1,100 genes)  Leifsonia xyli subsp. xyli: 13% (~300 genes)  Pseudogenes do not show up in BLAST searches
  • 31. APPLICATIONS Gene identification  comparative genomics can aid gene identification. Comparative genomics can recognize real genes based on their patterns of nucleotide conservation across evolutionary time. With the availability of genome-wide alignments across the genomes compared, the different ways by which sequences change in known genes and in intergenic regions can be analyzed. The alignments of known genes will reveal the conservation of the reading frame of protein translation. Regulatory motif discovery  Regulatory motifs are short DNA sequences about 6 to 15bp long that are used to control the expression of genes, dictating the conditions under which a gene will be turned on or off. Each motif is typically recognized by a specific DNA-binding protein called a transcription factor (TF). A transcription factor binds precise sites in the promoter region of target genes in a sequence- specific way, but this contact can tolerate some degree of sequence variation. Comparative genomics provides a powerful way to distinguish regulatory motifs from non-functional patterns based on their conservation.
  • 32. APPLICATIONS  Comparative genomics has wide applications in the field of molecular medicine and molecular evolution. The most significant application of comparative genomics in molecular medicine is the identification of drug targets of many infectious diseases. For example, comparative analyses of fungal genomes have led to the identification of many putative targets for novel antifungal. This discovery can aid in target based drug design to cure fungal diseases in human.  Comparative genomics also helps in the clustering of regulatory sites , which can help in the recognition of unknown regulatory regions in other genomes. The metabolic pathway regulation can also be recognized by means of comparative genomics of a species.  Agriculture is a field that reaps the benefits of comparative genomics. Identifying the loci of advantageous genes is a key step in breeding crops that are optimized for greater yield, cost-efficiency, quality, and disease resistance.