1. Genomics
a discipline in genetics that applies recombinant DNA, DNA
sequencing methods, and bioinformatics to sequence, assemble,
and analyze the function and structure of genomes
2. What is Genomics
Genomics is the field of genetics that attempts to understand the content,
organization, function, and evolution of genetic information contained in whole
genomes.
Genomics is the molecular characterization of whole genomes.
Genomics consists of two complementary fields:
Structural genomics determines the contents, organization and sequence of
the genetic information contained within a genome,
Functional genomics, which attempts to understand the function of
information in genomes
A third area, comparative genomics, compares the gene content, function,
and organization of genomes of different organisms.
3. Genomics and Modern Biology
Information resulting from research in this field has made
significant contributions to:
human health,
agriculture,
and numerous other areas.
It has also provided gene sequences necessary for producing
medically important proteins through recombinant DNA
technology.
Comparisons of genome sequences from different organisms
are leading to a better understanding of evolution and the history
of life.
4. Organization of the human genome
What is genome?
Two genomes;
Complex Nuclear 99.9995% of
the total genetic information,
a simple mitochondrial genome
which accounts for the remaining
0.0005%
Mitochondrial genes expression?
5. Organization of the human genome
General structure of the mitochondrial genome
a single type of circular double-stranded DNA whose complete nucleotide sequence
has been established .
It is 16 569 bp in length and is 44% (G + C)
The two DNA strands have significantly different base compositions:
the heavy (H) strand is rich in guanines,
the light (L) strand is rich in cytosines.
a small section is defined by a triple DNA strand structure.
This is because a short segment of the heavy strand is replicated for a second time,
giving a structure known as 7S DNA.
Human cells usually contain thousands of copies of the double-stranded
mitochondrial DNA molecule.
6. The human mitochondrial genome
The D loop is marked by a triple-stranded
structure and encompasses a duplicated short
region of the heavy strand, 7S DNA.
Transcription of the heavy (H) strand actually
originates from two closely spaced promoters
located in the D loop region, which for the sake
of clarity are grouped as PH.
Transcription from these promoters runs
clockwise round the circle;
Transcription from the light strand promoter
PL runs anticlockwise.
In both cases, the resulting large transcripts are
then cleaved to generate RNAs for individual
genes.
The high coding sequence density results from
absence of introns in all genes and close
apposition of genes, including one case of
overlapping genes: the ATPase 8 gene partly
overlaps the ATPase 6 gene .
Other polypeptide-encoding genes specify
seven NADH dehydrogenase subunits
(ND4L and ND1–ND6); three
cytochrome c oxidase subunits (CO1–CO3) and
cytochrome b (CYB).
7. Mitochondrial genes
The human mitochondrial genome contains 37 genes:
28 are encoded by the heavy strand
nine by the light strand.
Of the 37 genes, a total of 24 specify a mature RNA product:
22 mitochondrial tRNA molecules and two mitochondrial rRNA
molecules,
One 23S rRNA (a component of the large subunit of mitochondrial
ribosomes) and
One 16S rRNA (a component of the small subunit of the
mitochondrial ribosomes).
The remaining 13 genes encode polypeptides which are
synthesized on mitochondrial ribosomes.
8. The mitochondrial genetic code
13 polypeptides
The mitochondrial genetic code differs slightly from the nuclear genetic code
The mitochondrial genome encodes all the ribosomal RNA and tRNA
molecules it needs for synthesizing proteins but relies on nuclear-encoded
genes to provide all other components (such as the protein components of
mitochondrial ribosomes, amino acyl tRNA synthetases, etc.).
As there are only 22 different types of human mitochondrial tRNA,
individual tRNA molecules need to be able to interpret several different
codons.
Eight of the 22 tRNA molecules have anticodons which are each able to
recognize families of four codons differing only at the third base,
14 recognize pairs of codons which are identical at the first two base positions
and share either a purine or a pyrimidine at the third base.
Between them, therefore, the 22 mitochondrial tRNA molecules can
recognize a total of 60 codons [(8 × 4) + (14 × 2)].
The remaining four codons, UAG, UAA, AGA and AGG cannot be
recognized by mitochondrial tRNA and act as stop codons
9. Nuclear Genome
The nucleus of a human cell contains more than 99% of the cellular
DNA.
The nuclear genome is distributed between 24 different types of DNA
duplex which show considerable regional variation in base
composition and gene density
histones and other non-histone proteins bound to it, constituting a
chromosome.
primary constriction (centromere) present on each chromosome
long arms of chromosomes 1, 9 and 16 possess so-called secondary
constrictions
divided to euchromatic portion (3000Mb) which was used in the
Human Genome Project
and constitutive heterochromatin (200Mb) which is inactive
transcriptionally (found at centromeres, long arm of Y, short arm of
acrocentric chromosomes 13, 14, 15, 21, & 22, and secondary
constriction of long arm of 1, 19, & 16.
11. Base composition in the human nuclear genome
average GC = 42%, variable by chromosome.
dinucleotide CpG (that is, neighboring cytosine and guanine residues
on the same DNA strand in the 5 3 direction)′ → ′
C = G = 0.21, and so the expected frequency for the dinucleotide
CpG is (0.21)2
= 0.0441.
In vertebrate DNA, cytosine residues occurring in CpG dinucleotides
are targets for methylation at carbon atom 5.
Only about 3% of the cytosines in human DNA are methylated, but
most that are methylated are found in the CpG dinucleotide,
producing 5-methylcytosine.
Over evolutionarily long periods of time, 5-methylcytosine
spontaneously deaminates to give thymine and so CpG is continuously
being depleted and replaced by TpG (or CpA on the complementary
strand.
12. Gene density in the human nuclear genome
Total of about 70 000–80 000, of about 3000 genes per
chromosome.
gene density can vary substantially between chromosomal
regions and also between whole chromosomes.
For example, heterochromatic regions are known to be very
largely composed of repetitive noncoding DNA, and the
centromeres and large regions of the Y chromosome, in
particular, are notably devoid of genes.
gene density is high in subtelomeric regions and that some
chromosomes (e.g. 19 and 22) are gene rich while others (e.g. 4
and 18) are gene poor
13. Organization and distribution of human
genes
The nuclear genome contains about 65 000–80 000 genes but only about 3% of the genome
represents coding sequences
A variety of different approaches have been used to obtain more precise estimates of the total
gene number.
Genomic sequencing. Extrapolation from sequencing of large chromosomal regions may
suggest that there are about 70 000 genes This is based on the observation that gene-rich
regions have an average gene density of close to one per 20 kb, but gene-poor regions have a
much lower density, say one-tenth of this density, and that the genome is split 50:50 into
gene-rich and gene-poor regions. (Fields et al., 1994).
CpG island number. Restriction enzyme analysis using the methylation-sensitive
enzyme HpaII suggests that the total number of CpG islands in the human genome is 45 000.
Using an estimate that approximately 56% of genes are associated with CpG islands, these
authors have suggested a total of about 80 000 human genes. (Antequara and Bird, 1993)
EST analysis. Large-scale random sequencing of cDNA clones provides so-called expressed
sequence tags. Comparison of known human EST sequences with a large set of different
human genomic coding DNA sequences listed in sequence databases has suggested a figure of
about 65 000 human genes (Fields et al., 1994).
14. Functionally similar genes are occasionally clustered in the
human genome, but are more often dispersed over different
chromosomes
In the case of polypeptide-encoding gene families, some genes
encoding identical or functionally related products are clustered, but
often they are dispersed on several chromosomes.
Functionally identical genes
A very few human polypeptides are known to be encoded by two or
more identical gene copies.
Often, these are encoded by recently duplicated genes in a gene
cluster, as in the case of the duplicated -globin genes.α
In addition, some genes on different chromosomes encode identical
polypeptides.
Examples include members of histone gene subfamilies.
histones can be classified into five groups in terms of structure: H1
(the linker histone) and the four core histones, H2A, H2B, H3 and H4.
15. In addition, histone genes can be
classified into three groups according to
expression:
(i) replication-dependent (restricted to
the S phase of the cell cycle)
(ii) replication-independent (expressed
at a low level throughout the cell cycle)
(iii) tissue-specific, e.g. the H1t and
H3t genes are expressed exlusively in
the testis.
There appears to be a total of 61 human
histone genes which comprise several
subfamilies (Albig and Doeneck, 1997)
Most of the histone genes are found in
two multifamily clusters on the short
arm of chromosome 6, but genes on
several other human chromosomes can
specify identical copies of a particular
histone subtype .
Chromosomal distribution of
the human histone gene family
Eleven clusters comprising a total
of about 60 histone genes are
distributed over seven human
chromosomes. The two clusters
on 6p contain the great majority of
histone genes. Other clusters
contain only one or two of the
histone gene subtypes. Note that
identical histones can be specified
by genes on different
chromosomes.
16. Functionally similar genes
A large fraction of human genes are members of gene families
where individual genes are closely related but not identical in
sequence.
In many such cases the genes are clustered and have arisen by
tandem gene duplication, as in the case of the different members
of each of the -globin and -globin gene clustersα β
Genes which encode clearly related products but which are
located on different chromosomes are generally less related, as in
the case of the -globin and -globin genes. α β
17. Functionally related genes
Some genes encode products which may not be so closely related
in structure, but are clearly functionally related.
The products may be subunits of the same protein or
macromolecular structure, components of the same metabolic or
developmental pathway, or may be required to specifically bind
to each other as in the case of ligands and their relevant
receptors.
In almost all such cases, the genes are not clustered and are
usually found on different chromosomes
18. Human genes show enormous variation in size
and internal organization
Size Diversity
Genes in simple organisms such as bacteria
are comparatively similar in size, and
usually very short.
complex organisms such as mammals show
wide variation in gene size, a feature found
especially in human genes which can vary in
length from hundreds of nucleotides to
several megabases.
The enormous size of some human genes
means that transcription can be time-
consuming.
For example, the human dystrophin gene
requires about 16 hours to be transcribed
19. Diversity in internal organization
There is an inverse correlation between gene size and the
proportion of the gene length which is expressed at the RNA level
A very small minority of human genes lack introns and are
generally very small genes.
For those that do possess introns, the exon content as a
percentage of gene length tends to be very small in large genes.
This does not arise because exons in large genes are smaller than
those in small genes: the average exon size in human genes is
about 200 bp and, although very large exons are known, exon size
is comparatively independent of gene length
Instead, the explanation is due to the huge variation in intron
lengths: large genes tend to have very large introns
22. Overlapping genes and genes within genes are
known in the human genome
Partially overlapping genes
The genes of simple organisms are generally more clustered than those in complex organisms.
The average gene density in the human genome is about one per 40–45 kb of DNA.
Assuming a mean size of, say, 10–15 kb, human genes should be separated by about 30 kb of
nongenic DNA on average.
By contrast, average gene densities in simple organisms are very much higher: roughly one
per 1, 2 and 5 kb, respectively, for E. coli, Saccharomyces cerevisiae and Caenorhabditis elegans.
Simple genomes such as those of certain phages and bacteria often show examples of partially
overlapping genes which use different reading frames, sometimes from a common sense
strand.
The human mitochondrial genome is another example of a simple genome packed with
genetic information and it too has an example of such overlapping genes
the degree of gene clustering in the nuclear genome is largely dependent on the chromosomal
region, and in regions of high density occasional examples of overlapping genes have been
noted.
For example, the class III region of the HLA complex at 6p21.3 has an average gene density
of about one gene per 13 kb, and is known to contain several examples of overlapping genes
23. Genes within genes
The small nucleolar RNA (snoRNA) genes are unusual in that the majority of them are
located within other genes, often ones which encode a ribosome-associated protein or a
nucleolar protein. Possibly this arrangement has been maintained to permit coordinate
production of protein and RNA components of the ribosome (Tycowski et al., 1993).
In addition to the snoRNA genes there are a few examples of other genes being located within
the introns of larger genes, and in some cases the internal genes as well as the host genes are
known to encode polypeptides. Three illustrative examples are:
The neurofibromatosis type I (NF1) gene. Intron 26 of the NF1 gene spans about 40 kb and
contains three small genes, each with two exons which are transcribed from the opposite
strand to that used for the NF1 gene (Viskochilet al., 1991)
The factor VIII gene. Intron 22 of the blood clotting factor VIII gene (F8C) contains a CpG
island from which two internal genes, F8A and F8B are transcribed in opposite directions
(Levinson et al., 1992).
F8A is transcribed from the opposite strand to that used by the factor VIII gene.
F8B is transcribed in the same direction as the factor VIII gene to give a short mRNA containing
a new exon spliced on to exons 23–26 of the factor VIII gene
The retinoblastoma susceptibility gene RB1. Intron 17 of this gene is 72 kb long and contains a G
protein-coupled receptor gene, U16, which is actively transcribed from the opposite strand
24. Genes within genes: intron 26 of the gene for neurofibromatosis type I
(NF1) contains three internal genes each with two exons:
Three internal genes are transcribed from the opposing strand to that used
for transcription of the NF1 gene. Genes are: OGMP, oligodendrocyte
myelin glycoprotein; EVI2A and EVI2B, human homologs of murine genes
thought to be involved in leukemogenesis, and located at ecotropic viral
integration sites.