Bio info 5

Bioinformatics Lecture# 5
Dr. Naeem Ud Din Khattak
Professor
Department of Zoology
Islamia College Peshawar (Chartered University)

Phylogenetic Tree
Construction

• The mutation distance : The
minimal number of nucleotides that would
need to be altered in order for the gene for one
Protein to code for the other.
• ACTGAT A C TGAT -
T C T - ATC
TCTATC

3

The construction of the tree
• Assume proteins, A, B and C, and their
mutation distances.
B C
A 24 28
B 32

• There are two Qs:
1. Which pair does one join together first?
2. What are the lengths of edges a, b, and c? 4

Which pair does one join together first ?
• It is simply by choosing the pair with the
smallest mutation distance.

B C
A 24 28
B 32 A B C

5

What are the lengths of legs a, b, and c?
c
B C
A 24 28 a b
B 32 A B C

a+b=24 a =? a =10
a+c=28 b =? b =14
b+c=32 c =18
c =?
6

• i. a+b=24 ii. a+c=28 iii. b+c=32

• a+b=24 : a=24-b put the value of a in ii :
• 24-b+c=28 ; c-b=28-24; c-b=4 : c=4+b

• put value of c in iii. b+4+b=32 :
2b+4=32: 2b=32-4;

• b=28/2=14
• Now put the value of b in 1

• Note that this analysis
assumes that there are
no multiple
substitutions|||||||||||
||||when a single site
undergoes two or more
changes
e.g. the ancestral
sequence … ATGT … gives
… AGGT …
• and … ACGT …).

Phylogenetic Tree Terminology
Terminal Nodes
Branches or
Lineages A Represent the
TAXA (genes,
populations,
B species, etc.)
used to infer
C the phylogeny

D
Ancestral Node
or ROOT of Internal Nodes or E
the Tree Divergence Points (represent
hypothetical ancestors of the
taxa)

Based on lectures by C-B Stewart, and by
Tal Pupko

Phylogenetic trees diagram the evolutionary
relationships between the taxa

Taxon B
Taxon C
Taxon A
Taxon D
Taxon E
((A,(B,C)),(D,E)) = The above phylogeny
as nested parentheses
Tal Pupko

Clade Taxon B
Taxon C
Taxon A

clade Taxon D
Taxon E
((A,(B,C)),(D,E))

__ B and C are more closely related to each other
than either is to A,
___ and A, B, and C form a clade that is a sister
group to the clade composed of D and E. ____If
the tree has a time scale, then D and E are the most
closely related. Based on lectures by C-B Stewart, and by
Tal Pupko

• Nature acts conservatively, i.e., it does not
develop a new kind of biology for every life
form but continuously changes and adapts a
proven general concept.
• Novel functionalities do not appear because a
new gene has suddenly arisen but are
developed and modified during evolution.
• Thus, Alleles of a gene found in a population
arise from a common ancestor
gene_____________ HOMOLOGOUS

Homology is not a measure of
similarity, but rather that sequences
have a shared evolutionary history
and, therefore, possess a common
ancestral sequence
(Tatusovet al. 1997).
• An all or none phenomenon

Orthologs
• Homologous proteins from different species
that possess the same function
(e.g., corresponding kinases in a signal
transduction pathway in humans and mice)
are called orthologs.
Paralogs
• Homologous proteins that have different
functions in the same species (e.g., two
kinases in different signal transduction
pathways of humans) are termed paralogs.

• A visual representation of orthologs (and
some other commonly confused
terms, paralogs and homologs)

Orthologs: "genes that have diverged after a speciation event...
[that] tend to have similar function" (Fulton et al. 2006).
Thus, orthologs are genes whose encoded proteins fulfill
similar roles in different species.

• Homology is not
quantifiable –
• The similarity and Identity
of two sequences, however
IS

Identity
• ratio of the
number of
identical amino
acids or
nucleotides
relative to the
total number of
amino acids or
nucleotides.

4/20 = 0.2.

similarity
• Unlike identity, similarity is not as simple to
calculate. Before similarity can be
determined, it must first be defined how similar
the building blocks of sequences are to each
other.
• This is done with the help of similarity matrices
_____ specify the probability at which a
sequence transforms into another sequence
over time.
• dependent on the time and the mutational rate
of nucleotides.

• For nucleotide sequences the simplest solution
is an identity matrix ( Fig. 4.2a).

• For protein sequences, an identity matrix is not
sufficient to describe biological and evolutionary
processes.
• Amino acids are not exchanged with the same
probability as might be conceived theoretically.

• YOU CAN RECALL THE SYNONYMOUS AND
NON-SYNONYMOUS MUTATIONS

• For example, DNA
T
• an exchange of T in
aspartic acid for DNA

glutamic acid is
frequently
observed;
• aspartic acid
to tryptophan is
seen rarely.

• A second reason for the mutation of
aspartic acid- to- glutamic acid

to occur more often is that both have similar properties.

• In contrast aspartic acid and tryptophan are chemically
different
– the hydrophobic tryptophan is frequently found in the
center of proteins, whereas the hydrophilic aspartic acid
occurs more often at the surface.

• Amino acid substitution matrices, therefore,
describe the probability at which amino acids
are exchanged in the course of evolution.
• The most commonly used amino acid scoring
matrices are the
PAM
(Position Accepted Mutation; Dayhoff et al.
1978) and
BLOSUM groups
• (Blocks Substitution Matrix; Henikoff and
Henikoff 1992)

Tryptophan Trp W
Hydrophobic

aspartic acid Asp D

Glutamic acid Glu E
Hydrophilic
Electrically Charged (negative)

NUCLEOTIDE AND AMINO ACID
SEQUENCES ARE
EVOLUTIONARILY DIFFERENT
SO,
WE NEED DIFFERENT CRITERIA AND
MATRICES TO ANALYZE THEM

• ( Fig. 4.2 a)

• For nucleotide sequences the simplest solution
is an identity matrix

( Fig. 4.2 b) For Amino Acid Seqs
We need Similarity Matrices

Score: 65 Score: 19

Calculation of a global alignment of
two similar protein sequences.

Calculation of a global alignment of two similar protein
Sequences

• Using MEGA to Calculate
Mutation Distance

Outgroup to root a
phylogenetic tree
• The tree of
human, chimpanzee, gorilla
and orangutan genes is rooted
with a baboon gene because
• we know from the fossil record
that the common ancestor of
the four species split away
from baboon earlier in
geological time
• Let’s See Members of this
Group

Outgroup Chimp
Human
Gorilla
Orangutan

0.01

Chimp
Human
Gorilla
Orangutan
Baboon

0.02

Outgroup
Kiwi
Ostrich
Swan
Ring Necked Phaes
Silver phaesant
song sparrow
Parrot
Lizzard

The Design of the phylogenetic TREE does not
change the evolutionary distance among the
various taxa represented.

Kiwi
Struthio camelus
Swan
song sparrow
Ring nick ed Phaesant
Silver pheasant
Parrot

The Design of the phylogenetic TREE does not
change the evolutionary distance among the
various taxa represented.
Kiwi
Struthio camelus
Swan
song sparrow
Ring nick ed Phaesa
Silver pheasant
Parrot

Types of Trees
rooted trees
Common
Ancestor

Types of trees
Unrooted tree represents the same phylogeny without the
root node

Fig. 4.6. Phylogenetic tree of dopamine
receptor sequences.

Gene trees are not the same as
species trees

Examples of what can be inferred
from phylogenetic trees
(DNA, protein)
1. Which species are the closest living
relatives of modern humans?
2. Did the infamous Florida Dentist
infect his patients with HIV?
3. What is the relation between HIV
and SIV

Relatives of modern humans?

Humans Gorillas
Chimpanzees Chimpanzees

Bonobos Bonobos

Gorillas Orangutans
Orangutans Humans
14 0 15-30 0
MYA MYA

Mitochondrial DNA, most nuclear
DNA-encoded genes, and The pre-molecular view
DNA/DNA hybridization

2. Did the Florida Dentist infect his patients with HIV?

Phylogenetic tree DENTIST Yes:
of HIV sequences Patient C The HIV sequences
from the DENTIST, Patient A from
his Patients, & Local Patient G these patients fall
HIV-infected People:
Patient B within
Patient E the clade of HIV
Patient A sequences found in the
dentist.
DENTIST
Local control 2
Local control 3
Patient F No
Local control 9

Local control 35
Local control 3
Patient D No
From Ou et al. (1992) and Page & Holmes (1998) Tal Pupko

3. Relating Human HIV to Simian SIV
retroviruses

human immunodeficiency virus
1 (HIV-1), pathogenic

SIVs are not pathogenic in their
normal hosts

CD4 proteins on
surface

Phospholipid
membrane

Matrix

Capsid

Viral RNA

Viral enzymes:
- Reverse transcriptase
- Integrase
- Protease

The structure of HIV

IMAGE FROM: Medical Art Service, Munich / Wellcome Images.

New virus
leaves cell

New virus
assembled

HIV attaches to CD4
Viral RNA
receptors on T-Cell

Viral
proteins

Viral core of Viral protease
enzymes and RNA cuts up
injected into cell proteins

DNA transcribed
from viral RNA
Transcription

Double-stranded
DNA produced

Viral integrase
DNA integrates
HIV’s replication cycle with host
chromosome

Retrovirus genomes accumulate mutations
relatively quickly
• lacks an efficient proofreading, so make
errors when it carries out RNA-dependent
DNA synthesis.
• the molecular clock runs rapidly in
retroviruses,

•genomes that diverged quite recently display
sufficient nucleotide dissimilarity for a
phylogenetic analysis to be carried out.

•In less than 100 years, HIV and SIV genomes
contain sufficient data.

•The starting point for this
phylogenetic analysis is RNA extracted
from virus particles.
RT-PCR

RT-PCR
Reverse transcription polymerase chain
reaction (RT-PCR) is a variant of polymerase chain
reaction (PCR). It is a laboratory
technique commonly used in molecular biology
where a RNA strand is reverse transcribed into
its DNA complement (complementary
DNA, or cDNA) using the enzyme reverse
transcriptase, and the resulting cDNA is amplified
using PCR.

• This tree has a number of
interesting features. First
it shows that different
samples ofHIV-1 have
slightly different
sequences, the samples as
a whole forming a tight
cluster, almost a star-like
pattern, that radiates from
one end of the unrooted
tree.

•* This star-like topology implies
that the global AIDS epidemic
began with a very small number of
viruses, perhaps just one, which have spread and
diversified since entering the human population.
• The closest relative to HIV-1 among primates is
the SIV of chimpanzees, the implication being that
• this virus jumped across the species barrier
between chimps and humans and initiated
the AIDS epidemic.

• However, this epidemic did not
begin immediately: a relatively long
uninterrupted branch links the
center of the HIV-1 radiation with
the internal node leading to the
relevant SIV sequence, suggesting
that after transmission to
humans, HIV-1 underwent a latent
period when it remained restricted
to a small part of the global human
population, presumably in
Africa, before beginning its rapid
spread to other parts of the world.

• Other primate SIVs are less closely
related to HIV-1, but
one, the SIV from sooty
mangabey, clusters in the tree with
the second human
immunodeficiency virus, HIV-2.
• It appears that HIV-2 was
transferred to the human
population independently of HIV-
1, and from a different simian
host. HIV-2 is also able to
cause AIDS, but has not, as
yet, become globally epidemic.

REFERENCES
• http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2010/Rydbe
rg/Orthologs.html

Bio info 5

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Bio info 5

Ähnlich wie Bio info 5 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Bio info 5

Hinweis der Redaktion