The document discusses various types of non-coding DNA sequences, including repetitive sequences, transposons, non-coding RNAs, introns, and pseudogenes. It notes that while genes only make up 2-3% of human DNA, recent projects like ENCODE have found that a much larger portion of non-coding DNA is functionally important, for example through transcriptional and translational regulation of protein-coding sequences. The document outlines different classes of transposons, introns, non-coding RNAs and their various roles in gene expression, epigenetics, and genome evolution.
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Junk DNA/ Non-coding DNA and its Importance (Regulatory RNAs, RNA interference, Pseudogenes)
1. “Junk DNA” and
its importance
PRADEEP SINGH
M.Sc. Medical Biochemistry
HIMSR, Jamia Hamdard
2. JensMartensson
2
Content• Background
• Introduction
• Types of noncoding DNA sequences
1. Repetitive sequences
2. Transposons
3. Non-coding RNAs
4. Introns
5. Pseudogenes
6. Cis- and trans-regulatory sequences
• Regulation of Gene Expression
• Uses
Evolution
Long range correlations
Forensic anthropology
3. JensMartensson
Background
• The Human Genome is 3.2 billion base
pairs long and contains about 25,000 –
30,000 genes.
• These genes only make about 2-3% of our
DNA. The other 97-98% of our genome is
non-coding.
• The term “Junk DNA” was coined in 1960s
for non-coding DNA.
• In eukaryotes, genome size and the
amount of noncoding DNA, is not
correlated to organism complexity, an
observation known as the C-value
paradox. Examples: Amoeba dubia,
Utricularia gibba.
3
Fig: Amoeba dubia
Genome Size: 670 billion base
pairs
Fig: Utricularia gibba
Genome size: 82 million base pairs,
contains only 3% of non-coding
DNA
4. JensMartensson
• The Human Genome Project (1990-2003) : It is anticipated that
detailed knowledge of the human genome will provide new avenues for
advances in medicine and biotechnology.
• ENCODE: In 2012, the ENCODE project, a research program supported by
the National Human Genome Research Institute, reported that 80% of the
human genome is functional in some way. The human genome's
noncoding DNA sequences were transcribed and that nearly half of the
genome was in some way accessible to genetic regulatory proteins such
as transcription factors.
• Based on comparative genetics only 8 to 15% fraction of the human
genome is biologically functional.
4
5. JensMartensson
• Chromatin immunoprecipitation or ChIP-seq assays identify links between the
genome and the proteome by monitoring transcription regulation through histone
modification (epigenetics) or transcription factor–DNA binding interactions.
• Steps:
Step 1: Crosslinking (DNA-Protein inter)
Step 2: Cell lysis (protein–DNA extraction)
Step 3: Chromatin preparation (shearing/digestion)
Step 4: Immunoprecipitation
Step 5: Crosslinking reversal and DNA clean-up
Step 6: DNA quantification
ChIP-Seq
5
8. JensMartensson
Introduction
• “Junk DNA” or Non-coding
DNA sequences are components of an
organism's DNA that do
not encode proteins or polypeptides.
• Some noncoding DNA is transcribed into
functional non-coding RNA molecules
(e.g. transfer RNA, ribosomal RNA,
and regulatory RNAs).
• Other functions of noncoding DNA include
the transcriptional and translational regula
tion of protein-coding sequences, scaffold
attachment regions, origins of DNA
replication, centromeres and telomeres.
8
12. JensMartensson
2. Transposons
• The first transposons were discovered in maize (Zea mays), by
Barbara McClintok in 1948. Also known as “Jumping Genes.”
• Transposons are found in both prokaryotes and eukaryotes.
• In maize, they account for 85 percent of all the DNA.
• At least 44% of human genome consist of transposons. They clearly have roles in shaping the
structure of chromosomes and in modulating gene expression of genes.
• They can be classified into three categories:
1. Cut & Paste transposons: Transposition is accomplished by excising an element from its position
in a chromosome and inserting it into another position. (No increase in copy number) (Found in
both prokaryotes and eukaryotes)
2. Replicative transposons: Transposition is accomplished through a process that involves
replication of the transposable element’s DNA. During this interaction, the element is replicated
and one copy of it is inserted at the new site; one copy also remains at the original site. (Increase
in copy number) (Found only in prokaryotes)
3. Retrotransposons: Transposition is accomplished through a process that involves the insertion of
copies of an element that were synthesized from the element’s RNA by reverse transcriptase
which is then inserted into new chromosomal sites. (Increase in copy number) (Found only in
eukaryotes)
12
14. JensMartensson
Transposons Elements in Humans
• At least 44 percent of human DNA is derived from transposable elements,
including:
Retroviruslike Elements or LTR retrotransposons ~ 8%
Retroposons or non-LTR retrotransposons ~ 33%
Cut & Paste Transposons ~ 3%
• The priniciple transposable elements in human are retroposons. They are classified
into:
LINEs: Long Interspersed Nuclear Elements
SINEs: Short Interspersed Nuclear Elements
• LINEs are the most abundant retroposons in humans. They are further classified
into three subtypes:
L1: The human genome contain between 3000 to 5000 complete L1 elements
which are transpositionally active. In addition, more than 5,00,000 L1 elements that
are truncated at their 5’ ends i.e., transpositionally inactive. (Complete L1 elements are
about 6 kb long)
14
15. JensMartensson
• SINEs: SINEs are the second most abundant class of transposable elements
in the human genome. These elements are less than 400 base pair long
and do not encode proteins. Thus, SINEs depend on the LINEs to multiply
and insert within the genome.
• The human genome contain three families of SINEs, Alu, MIR and
Ther2/MIR3 elements. However, only the Alu elements are transpositionally
active.
• Importance:
* In human embryos, two types of transposons (DNA transposons & Retrotransposons) combined
to form non-coding RNA that catalyzes the development of stem cells. During the early stages of
a fetus’s growth, the embryo’s inner cell mass expands and give rise to all cells in the body.
* Transposable elements are mutagens and their movements are often the cause of genetic
disease. Diseases often caused by transposable elements include Hemophilia A and B, SCID,
Porphyria, colon cancer and Duchenne muscular dystrophy.
* The Sleeping Beauty transposon system is a synthetic DNA transposon designed to introduce
precisely defined DNA sequences into the chromosomes of vertebrate animals for the purposes
of introducing new traits and to discover new genes and their functions. It is being investigated
for use in human gene therapy.
* Cells defend against the proliferation of transposable elements in a number of ways. These
include piRNAs and siRNAs, which silence transposable elements after they have been
transcribed.
15
16. JensMartensson
3. Introns
• The word intron is derived from the term intragenic region, i.e. a region
inside a gene.
• The term intron refers to both the DNA sequence within a gene and the
corresponding sequence in RNA transcripts.
• Many genes like interferons, histons genes, ribonuclease genes (many), heat
shock protein genes, G protein coupled receptors (many) lack introns.
• Types of Introns:
16
Type of intron Gene type Splicing Mechanism
tRNAs and rRNAs tRNA genes Enzymatic
Nuclear (pre-mRNA) Protein-encoding genes
in nuclear chromosomes
Spliceosomal
Group I Some rRNA genes Self-splicing
Group II Protein-encoding genes
in mitochondria
Self-splicing
17. JensMartensson
Alternate Splicing
17
• RNA splicing was first discovered in 1970s
in viruses and subsequently in eukaryotes.
• Alternative splicing therefore is a process
by which exons or portions of exons or
noncoding regions within a pre-mRNA
transcript are differentially joined or
skipped, resulting in multiple protein
isoforms being encoded by a single gene.
• The first example of alternative splicing of
a cellular gene in eukaryotes was identified
in the IgM gene, a member of the
immunoglobulin superfamily.
• Another well known example of alternate
splicing in human include the troponin T
gene.
19. JensMartensson
RNA
function
RNA type Detailed role in the cell
Protein
synthesis
Transfer RNA
(tRNA)
Ribosomal
RNA (rRNA)
Adapter molecule bringing the amino acid
corresponding to a specific mRNA codon to the
ribosome. Having an anticodon (complimentary
to the codon), a site binding a specific amino
acid and a site binding aminoacyl-tRNA
synthetase (enzyme catalyzing amino acid-tRNA
binding).
RNA components of the ribosome, where
protein is translated. Ribosomes align the
anticodon of tRNA with the mRNA codon and
are required for the peptidyl transferase activity
catalyzing the assembly of amino acids into
19
20. JensMartensson
20
RNA
interference
(RNAi)/ Gene
silencing
Micro RNA
(miRNA)
Small (short)
interfering RNA
(siRNA)
Piwi-interacting
RNA (piRNA)
Endogenous, small, single-stranded RNAs inducing gene silencing by
binding to target sites found within the 3’UTR of the targeted mRNA,
resulting in translation repression or mRNA degradation. Regulating
processes like cell cycle, apoptosis, etc., and implicated in a number of
diseases.
Exogenous, short, double-stranded RNAs interfering with the
expression of specific genes with complementary nucleotide
sequences by inducing mRNA cleavage, resulting in no translation.
Short, single-stranded RNAs that are part of riboprotein complexes
active in the germ line, ensuring germ-line stability by silencing
transposons within germ cells. Occurring in clusters encoding 10 to
thousands of individual piRNAs throughout the mammalian genome.
RNA
function
RNA type Detailed role in the cell
21. JensMartensson
21
RNA interference (RNAi)
• RNAi is an evolutionary is an evolutionary conserved mechanism of gene regulation
that is induced by small silencing RNA in a sequence-specific manner.
• The phenomenon was first identified in C. elegans by Fire and Mello in 1998. Lin-4 and
Let-7 are two micro-RNAs (miRNAs) that were first discovered in C. elegans.
• Two principle types of small silencing RNA molecules are:
i. microRNA (miRNA) [endogenously produced]
ii. small interfering RNA (siRNA) [exogenously produced]
• These small RNAs have been shown to play critical roles in developmental timing,
haematopoietic cell differentiation, cell death, cell proliferation and oncogenesis.
22. JensMartensson
22
miRNAs mediated RNA
interference• miRNAs are synthesized in
the nucleus as long
transcripts, called pri-
miRNAs, that are
characterized by imperfect
hairpin structures.
• Drosha (RNase III) acts in
conjugation with ds-RNA
binding protein called
DGCR8 (in mammals) or
Pasha (in Drosophila) on pri-
mRNA and convert it into
pre-miRNA.
• Pre-miRNAs also contain
introns known as mirtrons.
~ 1000
nucleotides
70-100 nucleotides
Exportin 5
18-22 nucleotides
Rnase III
• Pre-miRNAs are transported to the cytoplasm by exportin-5 proteins in the nuclear membrane.
• Pre-miRNA is converted into miRNA duplex by Dicer. It also load it into RISC (RNA Induced
Silencing Complex).
23. JensMartensson
23
miRNAs mediated RNA
interference• RISC is composed of a family of proteins known as Argonaute (Ago) protein family.
• Argonautes are needed for mi-RNA induced silencing and contain two conserved RNA binding
domains:
i. PAZ domain: Binds single stranded 3’ end of mature miRNA.
ii. PIWI domain: Structurally resembles RNase H.
• miRNAs function via base-pairing with complementary sequences within target mRNA
molecule.
• Gene silencing occur by either via mRNA degradation or preventing mRNA from being
translated.
• If there is complete complementation, between miRNA and target mRNA sequence, Ago can
cleave the mRNA and lead to direct mRNA degradation.
• If there is incomplete complementation, then silencing is achieved by preventing translation.
24. JensMartensson
24
siRNAs mediated RNA interference
siRNA versus miRNA
• siRNAs are processed from dsRNA precursors
made up of two distinct strands of perfectly
base-paired RNA, while miRNAs originate from a
single, long transcript that forms imperfectly
base-paired hairpin structures.
• Mechanism of action is same.
• piRNAs (Piwi-interacting RNAs) are the most
recently discovered class of longer small RNAs
(~25-30 nucleotides) which bind to the Piwi
clade of argonaute proteins.
piRNAs
25. JensMartensson
5. Pseudogenes
• Pseudogenes refers to a segment of DNA that are related to real genes but are non-
functional. The term ‘”pseudogenes” was coined in 1977 by Jacq et al.
• Pseudogenes often result from the accumulation of multiple mutations within a
gene whose product is not required for the survival of the organism.
• Pseudogenes can complicate molecular genetic studies. For example, amplification
of gene by PCR may simultaneously amplify a pseudogene that shares similar
sequences. This is known as PCR bias or amplification bias.
• Types and Origin: There are four main types of pseudogenes.
I. Processed Pseudogenes
II. Non-processed Pseudogenes
III. Unitary Pseudogenes
IV. Pseudo-seudogenes
25
26. JensMartensson
26
1. Processed Pseudogenes
• Processed (or retrotransposed)
pseudogenes.
• In the process of processed transposons,
primary transcript (hnRNA) of a gene is
processed into mature mRNA.
• The mature mRNA is spontaneously
reverse transcribed back into cDNA.
• Double stranded DNA is synthesized from cDNA and inserted into the chromosomal DNA.
• Processed transposons usually contain a poly-A tail. They also lack the upstream promoters of
normal genes. Therefore, they cannot be transcribed again.
27. JensMartensson
27
II. Non-processed Pseudogenes
• Non-processed (or duplicated
pseudogenes)
• Gene duplication is another common and
important process in the evolution of
genomes caused by homologous
recombination.
• For example, repetitive SINE sequences on
misaligned chromosomes and
subsequently acquire mutations that cause
the copy to lose the original gene’s
function.
• Gene duplication generates
functional redundancy and it is not
normally advantageous to carry two
identical genes.
• According to some evolutionary models, shared duplicated pseudogenes indicate the
evolutionary relatedness of humans and the other primates.
28. JensMartensson
28
III. Unitary Pseudogenes
• Unitary transposons or disabled transposons.
• Various mutations (such as indels and nonsense
mutations) can prevent a gene from being
normally transcribed or translated, and thus the
gene may become less- or non-functional or
"deactivated".
• The mechanisms is same like non-processed
genes become pseudogenes, but the difference
in this case is that the gene was not duplicated
before pseudogenization.
• The classic example of a unitary pseudogene is
the gene that presumably coded the enzyme L-
gulono-γ-lactone oxidase (GULO) in primates.
In all mammals studied besides primates
(except guinea pigs), GULO aids in the
biosynthesis of ascorbic acid (vitamin C), but it
exists as a disabled gene (GULOP) in humans
and other primates.
30. JensMartensson
30
IV. Pseudo-pseudogenes
• Pseudogenes are generally considered
to be non-functional DNA sequences
that arise from protein-coding genes.
• Pseudogenes are often identified by
the appearance of a premature stop
codon in a predicted mRNA sequence.
• A small amount of protein product of
such mRNA sequence may still be
recognizable and function at some
level.
• In 2016, it was reported that 4
predicted pseudogenes in multiple
Drosophila species actually encode
proteins with biologically important
functions.
• Example: Olfactory receptors found
only in neurons in some species of
Drosophila.
33. JensMartensson
6. Cis- and Trans-regulatory
Elements• Cis-regulatory elements: Cis-regulatory elements are regions of non-coding DNA which
regulate the transcription of neighboring genes. Cis-elements may be located in 5' or 3‘
untranslated regions (UTRs) or within introns. Many such elements are involved in the
evolution and control of development.
• Trans-regulatory elements: Trans-regulatory elements are regions of non-coding DNA
sequences that encode transcription factors which may modify (or regulate) the
expression of distant genes.
Examples of trans-acting factors include the genes for:
I. Subunits of RNA polymerase
II. Proteins that bind to RNA polymerase to stabilize the initiation complex
III. Proteins that bind to all promoters of specific sequences, but not to RNA polymerase (TFIID
factors)
IV. Proteins that bind to a few promoters and are required for transcription initiation (positive
regulators of gene expression)
33
35. JensMartensson
Regulation of Gene Expression
• Some noncoding DNA sequences determine the expression levels of various genes, both
those that are transcribed to proteins and those that themselves are involved in gene
regulation.
• Transcription factors
Some noncoding DNA sequences determine where transcription factors attach.[55] A transcription factor
is a protein that binds to specific non-coding DNA sequences, thereby controlling the flow (or
transcription) of genetic information from DNA to mRNA.
• Operators
An operator is a segment of DNA to which a repressor binds. A repressor is a DNA-binding protein that
regulates the expression of one or more genes by binding to the operator and blocking the attachment
of RNA polymerase to the promoter, thus preventing transcription of the genes. This blocking of
expression is called repression.
• Enhancers
An enhancer is a short region of DNA that can be bound with proteins (trans-acting factors), much like
a set of transcription factors, to enhance transcription levels of genes in a gene cluster.
35
36. JensMartensson
36
• Silencers
A silencer is a region of DNA that inactivates gene expression when bound by a regulatory
protein. It functions in a very similar way as enhancers, only differing in the inactivation of genes.
• Promoters
A promoter is a region of DNA that facilitates transcription of a particular gene when a
transcription factor binds to it. Promoters are typically located near the genes they regulate and
upstream of them.
• Insulators
A genetic insulator is a boundary element that plays two distinct roles in gene expression, either
as an enhancer-blocking code, or rarely as a barrier against condensed chromatin. An insulator in
a DNA sequence is comparable to a linguistic word divider such as a comma in a sentence,
because the insulator indicates where an enhanced or repressed sequence ends.
38. JensMartensson
Uses of “Junk” DNA
• Evolution
Shared sequences of apparently non-functional DNA are a major line
of evidence of common descent
Pseudogene sequences appear to accumulate mutations more rapidly than
coding sequences due to a loss of selective pressure. This allows for the
creation of mutant alleles that incorporate new functions that may be
favored by natural selection; thus, pseudogenes can serve as raw material
for evolution and can be considered "protogenes".
• Mapping the distance between functional genes
• Forensic anthropology
Police sometimes gather DNA as evidence for purposes of forensic
identification.
38
Amoeba dubia is a protozoan, is one of the largest genome organism.
Do prokaryotes have non-coding DNA?
Encyclopedia of DNA elements
ChiP-seq is a way to isolate the DNA where a particular transcription factor binds which can later be sequenced.
Molecular biologists and Evolutionary Biologists
Mnase or Monococcal Nuclease
Protein Scaffold:
rRNA genes because ribosomes are required for rapid protein synthesis during various phases of cell cycle.
The excision and insertion events are catalyzed by an enzyme called transposase, which is usually coded by the enzyme itself.
Selfish DNA: The DNA that replicates but have no use to the host cell it inhabits.
Cut & Paste Transposons in eukaryotes includes P-element in Drosophila.
Tc1/mariner is a class and superfamily of interspersed repeats DNA (Class II) transposons. The elements of this class are found in all animals including humans.
Sleeping beauty system, awarded Molecule of the Year in 2009.
Two transesterification reaction:
Nuclear pre-mRNA contains A at the branch site
Group I Introns contains G at the branch site
Group II Introns contains A at the branch site
Group I Introns releases a Linear Intron Rather than a Lariat
Drosha and DGCR8 are microprocessor complex that helps in the processing of pri-mRNA.
Drosha/Pasha: ds-RNA binding protein called pasha (in Drosophila) and DGCR8 (in mammals).
Drosha and DGCR8 are microprocessor complex that helps in the processing of pri-mRNA.
Drosha/Pasha: ds-RNA binding protein called pasha (in Drosophila) and DGCR8 (in mammals).
Gene redundancy is the existence of multiple genes in the genome of an organism that perform the same function.
Indel is a molecular biology term for an insertion or deletion of bases in the genome of an organism.
These are the same mechanisms by which non-processed genes become pseudogenes, but the difference in this case is that the gene was not duplicated before pseudogenization.
Another example include the GPCRs that binds to ligands but their function is unknown. We call them as Pseudogenes.
Recently 140 human pseudogenes have been shown to be translated. However, the function, if any, of the protein products is unknown.