2. • Genomics - It is the study of genomes.
• The field of genomics comprises of two
main areas:
1.Comparative genomics
2. Functional genomics
3. • Comparative genomics is a large-scale, holistic approach that
compares two or more genomes to discover the similarities
and differences between the genomes and to study the
biology of the individual genomes
• The subject of comparative genomics impinges on
– Evolutionary biology and phylogenetic reconstructions of the tree of
life,
– Drug discovery programs,
– Function predictions of hypothetical proteins
– Identification of genes, regulatory motifs and other non-coding
DNA motifs
– Genome flux and dynamics
4. • Comparative analysis of genome structure
• Comparative analysis of coding regions
• Comparative analysis of non-coding regions
Methods for comparative genomics
5. • The structure of different genomes can be compared at
three levels:
– Overall nucleotide statistics,
– Genome structure at DNA level,and
– Genome structure at gene level
6. • Overall nucleotide statistics, such as
– Genome size,
– Overall (G+C) content,
– Regions of different (G+C) content,
– Genome signature such as codon usage biases,
– Amino acid usage biases, and the ratio of observed di-
nucleotide frequency and
– The expected frequency given random nucleotide
distribution
7. • Chromosomal breakage and exchange of chromosomal
fragments are common mode of gene evolution. They can be
studied by comparing genome structures at DNAlevel.
– Identification of conserved synteny and genome
rearrangement events
– Analysis of breakpoints
– Analysis of content and distribution of DNArepeats
8. • Chromosomal breakage and exchange
of chromosomal fragments cause
disruption of gene order
• Therefore gene order correlates with
evolutionary distance between genomes
9. Identification of gene-coding regions
comparison of gene content
comparison of protein content
Comparative genome based function prediction
10. • Noncoding regions of the genome gained a lot
of attention in recent years because of its
predicted role in regulation of transcription,
DNA replication, and other biological
functions
• This approach is based on the presumption that
selective pressure causes regulatory elements
to evolve at a slower rate than that of non
regulatory sequences in the non coding
regions
11. • Comparative genomic studies throw important light
on the pathogenesis of organisms, throwing up
opportunities for therapeutic intervention as well as
help in understanding and identifying disease genes
• One of the most important fallouts of comparative
analyses at a genome-wide scale is in the ability to
identify and develop novel drug targets
• If one is looking for antibacterial, antifungal, or
antiprotozoal proteins to be used as targets,
comparative genome analysis can reveal virulence
genes, uncharacterized essential genes, species-
specific genes, organism-specific genes, while ensuring
that the chosen genes have no homologues in humans
12.
13. • It is largely experiment based with a focus on
gene functions at the whole genome level
using high throughput approaches.
• The high throughput analysis of all expressed
genes is also termed transcriptome
analysis
• Transcriptome analysis can be conducted by
two approaches:
1) sequence based approaches
2) microarray based approaches
14. • Expressed sequence tags :- ESTs are short
sequences of cDNA typically 200-400 nucleotides in
length.
• Obtained from either 5’ end or 3’ end of cDNA inserts of cDNA
library.
ADVANTAGESOF E.S.T :-
• Provide a rough estimate of genes that are actively
expressed in a genome under a particular physiological
condition.
• Help in discovering new genes, due to random
sequencing of cDNA clones.
• EST libraries can be easily generated
15. DRAWBACKSOF USING E.S.Ts:-
• Automatically generated without verification thus
contain high error rates.
• There is often contamination by vector
sequence , introns, ribosomal RNA,
mitochondrial RNA.
• Weakly expressed genes are hardly found in EST
sequencing survey.
• ESTs represent only partial sequences of genes.
16. Major EST index sequence databases
are:
• Unigene – is an NCBI EST cluster
database.
• TIGR Gene indices
17. • Serial analysis of gene expression
• Another high throughput, Sequence-based
approach for gene expression profile analysis.
• More quantitative in determining mRNA
expression in cell.
• Short fragments (taqs) of DNA, excised from
cDNA sequences , act as unique markers of gene
transcript
• SAGE invented at Johns Hopkins University in
USA (Oncology Center) by Dr. Victor in 1995.
18. • A short sequence tag (10-14bp) contains
sufficient information to uniquely identify a
transcript.
• Sequence tag can be linked together to
form long serial molecules that can be
cloned and sequenced.
• the number of times a particular tag is
observed provides the expression level of
the corresponding transcript
19. Trapping of RNA with beads
cDNA synthesis
Enzymatic cleavage of cDNA
Ligation of Linkers to bound cDNA
Isolation and concatamerization of ditags
PCR amplification of Ditags
Formation of Ditags
Cleaving with tagging enzyme
20. • Software tools for SAGEanalysis:
• SAGE map
• SAGE xprofiler
• SAGE Genie
• Advantages over EST analysis:
a. Detect weakly expressed genes
b. It uses a short nucleotide tag and allows
sequencing of multiple tags in a single clone
21. A microarray is a pattern of ssDNA probes which are
immobilized on a surface called a chip or a slide.
• Microarrays use hybridization to detect a specific DNA
or RNA in a sample.
• DNA microarray uses a million different probes, fixed
on a solid surface.
• Microarray technology evolved from Southern
blotting.
22. • To analyze the expression of thousands of
genes in single reaction, very quickly and
in an efficient manner.
• To understand the genetic causes for
the abnormal functioning of the
human body.
• To understand which genes are active
and which genes are inactive in
different cell types.
23. • Length of oligonucleotides is in range of 25-70 bases
long
• Oligonucleotides are called probes
• Oligonucleotide should not form stable internal
secondary structure.
• All probes should have approx. equal Tm.
• OligoWiz & OligoArray are 2 programs used in
designing probe for microarray spotting.
1) The analysis and comparison of the coding regions starts with the gene identification algorithm that is used to infer what portions of the genomic sequence actively code for genes. Based on transcription evidence, homology of gene, genome similarity.
2)first statistics to compare is the estimated total number of genes in a genome, elucidate the similarities and differences between the genomes include percentage of the genome that code for genes, distribution of coding regions across the genome (a.k.a. gene density), average gene length, codon usage. BLASTN or TBLASTX is used.
3)It is important to compare the protein contents in critical pathways and important functional categories across genom
Two widely used resources for pathways and functional categories are the KEGG pathway database and the Gene Ontology (GO) hierarchy
4) functional assignment of genes in a non similarity-based manner. This rely on the basic premise that genes; that are functionally related, are genes that are closely associated across genomes in some form
This include three methods:
Co-conservation across genomes
Conservation of gene clusters and genomic context across species
Physical fusion of functionally linked genes across species (Domain fusion analysis)
Isolate a total RNA containing mRNA that ideally represents genes, that are expressed at the time of sample collection.
Preparation of cDNA from mRNA using a reverse-transcriptase enzyme.
Short primer is required to initiate cDNA synthesis.
Each cDNA (Sample and Control) is labelled with fluorescent cyanine dyes (i.e. Cy3 and Cy5).
labelled cDNA is competitively hybridized against cDNA molecules spotted on a glass slide.