P53_Final_Presentation

By Jonah Kohen
Advised by Prof. Brian Chen

 1) Who are we and what do we do?
 2) DNA and p53: Basics
 3) Research problem
 4) Approach 1: electrostatic complementarity
across whole region.
 5) Approach 2: complementarity at the 14
binding amino acids.
 6) Approach 3: Cubes
 7) Conclusions

 Jonah Kohen – B.S. Computer Engineering
(expected May 2015), Lehigh University
 Sara Grogan – B.S. IBE Chemical Engineer,
minor Biotechnology (expected May 2017),
Lehigh University
 Prof. Brian Chen – P.C. Rossin Assistant
Professor of Computer Science and
Engineering at Lehigh University
Sara
Grogan
Brian Chen

 Prof. Chen works in structural bioinformatics
and has created programs modeling
biomolecule interactions to aid in
bioinformatics research.
 I chose to study p53 because it plays a pivotal
role in cancer.

 Electrostatic analysis of the interaction
between DNA and p53 mutants can be used
to predict whether or not the mutation will
lead to cancer.

 The DNA in the nucleus of cells contains the
instructions for the production of proteins.
p53 is one such protein.
 Proteins are large molecules composed of
amino acids that perform certain functions
like cellular metabolic processes.

 p53 performs its function by binding to a
particular region on the DNA.

 p53 becomes activated in response to cellular
stress, for example DNA damage.
 p53 in turn activates DNA repair proteins,
suspends cell division, and initiates apoptosis
(cell death).
 These damage control mechanisms help
prevent cancer.
Repair Suspend Division Cell Death

 When functioning normally, p53 suppresses
the proliferation of cancer cells.
 Mutations to p53 may hinder this function.
 Mutations in the p53 tumor suppressor are
the most frequently observed genetic
alterations in human cancer.
 Each of these variants may carry one or
several substitutions.
3 p53
proteins
interacting
with DNA.

 A p53 mutant is “active” if this variant of p53
is functioning normally.
 A p53 mutant is “inactive” if this variant of
p53 is unable to function normally.
Active: Good Inactive: Bad

 To design an algorithm that can reliably
classify a p53 mutant as active or inactive.
 Several predictors have been proposed, but
most are unreliable.
 A reliable predictor would help us diagnose a
mutation as cancerous or not.

 p53 is composed of 393 amino acids. The
region responsible for DNA binding is
between amino acids number 102-292.
 Within this region, I am looking only at 14
amino acids that directly bind to DNA.

 Previous research has found that these 14
amino acids are: a119, a276, n239, n247,
r248, r273, r280, r283, c275, c277, l120,
m243, s121, and s241.
l120 s121 n239 s241 r248 r273 c277 r283

 If the 273rd amino acid of p53 (arginine) is
changed to histidine (abbreviated r273h), p53
is inactivated.
r273h

 If the 273rd amino acid of p53 is changed to
histidine AND the 263rd amino acid of p53
(asparagine) is changed to valine (abbrev.
r273h/n263v), p53 remains functional.
r273h/n263v

 Why are mutants like r273h inactive while
other mutants like r273h/n263v active?
r273h r273h/n263v
???

 Data set of 541 pdb files, each one describing
a different p53 mutant.
 143 active mutants, 77 involving the 14 key
amino acids.
 398 inactive mutants, 155 involving the 14
key amino acids.
 Activity determined by in vivo analysis.
 Source: Richard H. Lathrop, UC Irvine.

pdb file of unmutated p53, viewed
in Pymol (3D structure).
=
Segment of same pdb file viewed
as text.

Leads to…
r273h/s240q
Observable changes in structural and electrostatic properties.
Mutations involving any of the 14 binding amino acids (one or more).
Which we want to use to…
Reliably classify p53 mutants as active or inactive.

 p53 and DNA are both primarily negatively
charged molecules. However, p53 has
positive pockets that interact with the
negatively charged DNA.

 The electrostatic complementarity region is
defined as a region where negatively charged
DNA overlaps with positively charged p53.
+1
+2
+3
-1
-2
-3
protein +1
isopotential
DNA -1
isopotential

 Compute +1 isopotential around p53 mutant,
-1 isopotential around DNA, and find the
electrostatic complementarity region between
the two.
 Isopotentials generated by “surfaceExtractor”,
Boolean intersections generated by VASP
(Volumetric Analysis of Surface Properties).

 The -1 isopotential region of the p53-
binding DNA motif.
DNA -1
isopotential,
yellow
indicates
negative
charge

Intersection of DNA and unmutated (wild type) p53

 The volume of the electrostatic
complementarity region between amino acids
102-292 is computed for every mutant.
This picture is the
visual representation
of a volumetric
computation.

0
1/20
1/10
3/20
1/5
1/4
3/10
360 380 400 420 440 460 480 500 520 540 560 580 600
Rel. Freq. active
Rel. Freq. Inactive
Complementarity Region Volume
Relative
Frequency

 Given a random complementarity region
volume, it is mostly impossible to determine
whether the mutant is active or inactive.
0
1/20
1/10
3/20
1/5
1/4
3/10
360
380
400
420
440
460
480
500
520
540
560
580
600
Rel. Freq. active
Rel. Freq. Inactive
RelativeFrequency
Random mutant:
Volume = 380,
effect = inactive Random mutant:
Volume = 580,
effect = inactive
Random mutant:
Volume = 480,
effect = ????

 What happens if we only analyze the volume
near the 14 amino acids directly involved in
DNA binding?
 Again, these amino acids are: a119, a276,
n239, n247, r248, r273, r280, r283, c275,
c277, l120, m243, s121, and s241.

 From the entire electrostatic complementarity
region, we take all subregions that are within
five cubic Angstroms of any of the 14 binding
amino acids. The rest of the region is
discarded.
Electrostatic Complementarity
region for unmutated p53
within 5 cubic Angstroms of
the 13 binding amino acids

 Results of this test indicate that if the region
volume is below a certain threshold, it is
guaranteed inactive.
 These lower volumes correspond to real life
mutations of r248 and r283.
0
0.05
0.1
0.15
0.2
0.25
Rel. Freq. Active
Rel. Freq. Inactive
Relative
Frequency

0
1/20
1/10
3/20
1/5
1/4
3/10
360
380
400
420
440
460
480
500
520
540
560
580
600
Rel. Freq. active
Rel. Freq. Inactive
RelativeFrequency
0
0.05
0.1
0.15
0.2
0.25
Rel. Freq. Active
Rel. Freq. Inactive
RelativeFrequencyApproach 1
Approach 2

 We divide the EC region into 39 sub regions
defined by cubes. We calculate the volume of
electrostatic complementarity in each cube.
Intersection region for wild
type p53 separated into
cubes

 Each mutation is an observation with 39
features. Each feature is the volume of the
electrostatic complementarity region
contained in a particular cube.
r273r 41.082862 78.972403 16.34071
Mutation
Name Cube 11 Cube 60 Cube 101

 The collection of all observations is a matrix
with 39 columns (cube volumes) and 232
rows (mutations).
Mutation Cube 3 Cube 4 Cube 11
r158l_s227f_n239y 7.445915 10.34335 41.34778
r249m_n235k_n239y 5.358244 10.58604 37.30913
r273c_d281g_e285g 7.348197 10.26282 42.42635

 A “true positive” is a correctly recognized
active mutant. A “true negative” is a correctly
recognized inactive mutant.

 Volumetric analysis of electrostatic
complementarity regions holds promising
applications to p53 study.
 A prediction algorithm with high sensitivity is
possible from analysis of cube data.
 Our hope is that, using more refined analysis
techniques, the prediction algorithm can be
made even more specific.

 Support Vector Machines (SVM) can be used
to calculate a basis function that separates
active mutants from inactive mutants based
on cube volumes.

0
1/10
1/5
3/10
360
380
400
420
440
460
480
500
520
540
560
580
600
Intersection Volumes+1 Relative
Frequency
Rel. Freq. active
Rel. Freq. Inactive
RelativeFrequency
0
0.1
0.2
0.3
210
225
240
255
270
285
300
315
More
Binding Acid regions +1
Relative Frequency
Rel. Freq. Active
Rel. Freq. Inactive
RelativeFrequency

 In both histograms, it is clear that there is a
certain lower and upper bound that all active
mutants fall between.
 If these thresholds can be detected, it is
possible to accurately predict inactive p53
mutants.
 Using the entire intersection volume for this
purpose is not as accurate.

 For each cube, we calculate the lower
threshold (minimum volume across all
mutations) and the higher threshold
(maximum volume across all mutations).
Mutation Cube 3 Cube 4 Cube 11
r158l_s227f_n239y 7.445915 10.34335 41.34778
r249m_n235k_n239y 5.358244 10.58604 37.30913
r273c_d281g_e285g 7.348197 10.26282 42.42635
Lower threshold 5.358244 10.26282 37.30913
Upper threshold 7.445915 10.58604 42.42635

 The remaining data set is divided into two
parts: a second training set to determine
which cubes indicate inactivity, and a test set
to analyze the predictive power of the chosen
cubes.
 For each mutation, 39 true/false values (one
for each cube) are computed. Each cube with
a volume above or below the thresholds set
for that particular cube in stage 1 gets a
“true” value.

Total Dataset
Dataset minus
training set
Training Set 1
(only actives)
Stage 1
Test Set Training Set 2
Thresholds generated
in Training Set 1 used
to extract inactivity
indicators from
Training Set 2
Stage 2
Inactivity indicators are
applied to the test set.

 Active mutants have, on average, far fewer
“true” values (threshold violations) than
inactive mutants. Depending on the training
and test sets, the worst active mutants had
anywhere from 3-6 true values, while
inactives can have more than 13.
1 violation: 46
2 violations: 29
3 violations: 25
4 violations: 9
5 violations: 5
6 violations: 1
7 violations: 0
8 violations: 1
9 violations: 0
10 violations: 0
11 violations: 0
12 violations: 0
13 violations: 0
Num violators: 116
1 violation: 6
2 violations: 2
3 violations: 1
4 violations: 0
5 violations: 0
6 violations: 0
Num violators: 9
# of Active Mutants with n threshold violations # of Inactive Mutants with n threshold violations

 A simple example: Mutations a-e have
violations in cubes 1-5 if they have a “TRUE”.
 For each cube, the number of mutations with
true values in that cube and 0, 1, 2, etc other
violations is counted.
Mutation 1 2 3 4 5
A TRUE TRUE TRUE TRUE
B TRUE TRUE TRUE TRUE
C TRUE TRUE TRUE TRUE
D TRUE
E TRUE TRUE

Cube Number of
violators w/
0 other
violations
Number of
violators w/
1 other
violation
Number of
violators w/
2 other
violations
Number of
violators w/
3 other
violations
1 3 3 2 2
2 3 2 2 2
3 3 3 3 3
4 3 3 3 3
5 3 3 2 2
 The cubes marked in red are the cubes of
interest. The amount of cubes with violations
in them does not change between 0 and 3
other threshold violations.

 For each cube, the amount of mutations in
the second training set with a threshold
violation in that cube were counted.
 Separate counts were generated for
mutations that had a violation in that cube
along with 1, 2, 3, etc other cubes.

 If all active mutations have 3 or less threshold
violations, then cubes that sustain the same
count between 0 and 3 neighboring violations
are only present in inactive mutations.
 These inactivity indicators are used to classify
the mutants in the test set as either active or
inactive.
Test Set Training Set 2
Inactivity indicators are
applied to the test set.

 The algorithm searches for inactive mutants.
Any mutant with more than three threshold
violations is automatically counted as
inactive.
 From the remaining pool, all mutants with
threshold violations in the cubes that do not
have their counts change between 0 and 3
concomitant violations are also counted as
inactive.

 Cubes that guarantee inactivity have still not
been completely identified.
 The best inactivity indicators generated thus
far are not the most sensitive.
 Next steps involve either a refinement of the
threshold violations algorithm or the use of
other methods.

P53_Final_Presentation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (9)

Ähnlich wie P53_Final_Presentation

Ähnlich wie P53_Final_Presentation (20)

P53_Final_Presentation

Hinweis der Redaktion