2. 1) Who are we and what do we do?
2) DNA and p53: Basics
3) Research problem
4) Approach 1: electrostatic complementarity
across whole region.
5) Approach 2: complementarity at the 14
binding amino acids.
6) Approach 3: Cubes
7) Conclusions
3. Jonah Kohen – B.S. Computer Engineering
(expected May 2015), Lehigh University
Sara Grogan – B.S. IBE Chemical Engineer,
minor Biotechnology (expected May 2017),
Lehigh University
Prof. Brian Chen – P.C. Rossin Assistant
Professor of Computer Science and
Engineering at Lehigh University
Sara
Grogan
Brian Chen
4. Prof. Chen works in structural bioinformatics
and has created programs modeling
biomolecule interactions to aid in
bioinformatics research.
I chose to study p53 because it plays a pivotal
role in cancer.
5. Electrostatic analysis of the interaction
between DNA and p53 mutants can be used
to predict whether or not the mutation will
lead to cancer.
6. 1) Who are we and what do we do?
2) DNA and p53: Basics
3) Research problem
4) Approach 1: electrostatic complementarity
across whole region.
5) Approach 2: complementarity at the 14
binding amino acids.
6) Approach 3: Cubes
7) Conclusions
7. The DNA in the nucleus of cells contains the
instructions for the production of proteins.
p53 is one such protein.
Proteins are large molecules composed of
amino acids that perform certain functions
like cellular metabolic processes.
8. p53 performs its function by binding to a
particular region on the DNA.
9. p53 becomes activated in response to cellular
stress, for example DNA damage.
p53 in turn activates DNA repair proteins,
suspends cell division, and initiates apoptosis
(cell death).
These damage control mechanisms help
prevent cancer.
Repair Suspend Division Cell Death
10. When functioning normally, p53 suppresses
the proliferation of cancer cells.
Mutations to p53 may hinder this function.
Mutations in the p53 tumor suppressor are
the most frequently observed genetic
alterations in human cancer.
Each of these variants may carry one or
several substitutions.
3 p53
proteins
interacting
with DNA.
11. A p53 mutant is “active” if this variant of p53
is functioning normally.
A p53 mutant is “inactive” if this variant of
p53 is unable to function normally.
Active: Good Inactive: Bad
12. To design an algorithm that can reliably
classify a p53 mutant as active or inactive.
Several predictors have been proposed, but
most are unreliable.
A reliable predictor would help us diagnose a
mutation as cancerous or not.
13. p53 is composed of 393 amino acids. The
region responsible for DNA binding is
between amino acids number 102-292.
Within this region, I am looking only at 14
amino acids that directly bind to DNA.
14. Previous research has found that these 14
amino acids are: a119, a276, n239, n247,
r248, r273, r280, r283, c275, c277, l120,
m243, s121, and s241.
l120 s121 n239 s241 r248 r273 c277 r283
15. 1) Who are we and what do we do?
2) DNA and p53: Basics
3) Research problem
4) Approach 1: electrostatic complementarity
across whole region.
5) Approach 2: complementarity at the 14
binding amino acids.
6) Approach 3: Cubes
7) Conclusions
16. If the 273rd amino acid of p53 (arginine) is
changed to histidine (abbreviated r273h), p53
is inactivated.
r273h
17. If the 273rd amino acid of p53 is changed to
histidine AND the 263rd amino acid of p53
(asparagine) is changed to valine (abbrev.
r273h/n263v), p53 remains functional.
r273h/n263v
18. Why are mutants like r273h inactive while
other mutants like r273h/n263v active?
r273h r273h/n263v
???
19. Data set of 541 pdb files, each one describing
a different p53 mutant.
143 active mutants, 77 involving the 14 key
amino acids.
398 inactive mutants, 155 involving the 14
key amino acids.
Activity determined by in vivo analysis.
Source: Richard H. Lathrop, UC Irvine.
20. pdb file of unmutated p53, viewed
in Pymol (3D structure).
=
Segment of same pdb file viewed
as text.
21. Leads to…
r273h/s240q
Observable changes in structural and electrostatic properties.
Mutations involving any of the 14 binding amino acids (one or more).
Which we want to use to…
Reliably classify p53 mutants as active or inactive.
22. p53 and DNA are both primarily negatively
charged molecules. However, p53 has
positive pockets that interact with the
negatively charged DNA.
23. The electrostatic complementarity region is
defined as a region where negatively charged
DNA overlaps with positively charged p53.
+1
+2
+3
-1
-2
-3
protein +1
isopotential
DNA -1
isopotential
24. Compute +1 isopotential around p53 mutant,
-1 isopotential around DNA, and find the
electrostatic complementarity region between
the two.
Isopotentials generated by “surfaceExtractor”,
Boolean intersections generated by VASP
(Volumetric Analysis of Surface Properties).
25. The -1 isopotential region of the p53-
binding DNA motif.
DNA -1
isopotential,
yellow
indicates
negative
charge
28. 1) Who are we and what do we do?
2) DNA and p53: Basics
3) Research problem
4) Approach 1: electrostatic complementarity
across whole region.
5) Approach 2: complementarity at the 14
binding amino acids.
6) Approach 3: Cubes
7) Conclusions
29. The volume of the electrostatic
complementarity region between amino acids
102-292 is computed for every mutant.
This picture is the
visual representation
of a volumetric
computation.
31. Given a random complementarity region
volume, it is mostly impossible to determine
whether the mutant is active or inactive.
0
1/20
1/10
3/20
1/5
1/4
3/10
360
380
400
420
440
460
480
500
520
540
560
580
600
Rel. Freq. active
Rel. Freq. Inactive
Complementarity Region Volume
RelativeFrequency
Random mutant:
Volume = 380,
effect = inactive Random mutant:
Volume = 580,
effect = inactive
Random mutant:
Volume = 480,
effect = ????
32. 1) Who are we and what do we do?
2) DNA and p53: Basics
3) Research problem
4) Approach 1: electrostatic complementarity
across whole region.
5) Approach 2: complementarity at the 14
binding amino acids.
6) Approach 3: Cubes
7) Conclusions
33. What happens if we only analyze the volume
near the 14 amino acids directly involved in
DNA binding?
Again, these amino acids are: a119, a276,
n239, n247, r248, r273, r280, r283, c275,
c277, l120, m243, s121, and s241.
34. From the entire electrostatic complementarity
region, we take all subregions that are within
five cubic Angstroms of any of the 14 binding
amino acids. The rest of the region is
discarded.
Electrostatic Complementarity
region for unmutated p53
within 5 cubic Angstroms of
the 13 binding amino acids
35. Results of this test indicate that if the region
volume is below a certain threshold, it is
guaranteed inactive.
These lower volumes correspond to real life
mutations of r248 and r283.
0
0.05
0.1
0.15
0.2
0.25
Rel. Freq. Active
Rel. Freq. Inactive
Complementarity Region Volume
Relative
Frequency
37. 1) Who are we and what do we do?
2) DNA and p53: Basics
3) Research problem
4) Approach 1: electrostatic complementarity
across whole region.
5) Approach 2: complementarity at the 14
binding amino acids.
6) Approach 3: Cubes
7) Conclusions
38. We divide the EC region into 39 sub regions
defined by cubes. We calculate the volume of
electrostatic complementarity in each cube.
Intersection region for wild
type p53 separated into
cubes
39. Each mutation is an observation with 39
features. Each feature is the volume of the
electrostatic complementarity region
contained in a particular cube.
r273r 41.082862 78.972403 16.34071
Mutation
Name Cube 11 Cube 60 Cube 101
40. The collection of all observations is a matrix
with 39 columns (cube volumes) and 232
rows (mutations).
Mutation Cube 3 Cube 4 Cube 11
r158l_s227f_n239y 7.445915 10.34335 41.34778
r249m_n235k_n239y 5.358244 10.58604 37.30913
r273c_d281g_e285g 7.348197 10.26282 42.42635
41. A “true positive” is a correctly recognized
active mutant. A “true negative” is a correctly
recognized inactive mutant.
42. 1) Who are we and what do we do?
2) DNA and p53: Basics
3) Research problem
4) Approach 1: electrostatic complementarity
across whole region.
5) Approach 2: complementarity at the 14
binding amino acids.
6) Approach 3: Cubes
7) Conclusions
43. Volumetric analysis of electrostatic
complementarity regions holds promising
applications to p53 study.
A prediction algorithm with high sensitivity is
possible from analysis of cube data.
Our hope is that, using more refined analysis
techniques, the prediction algorithm can be
made even more specific.
44. Support Vector Machines (SVM) can be used
to calculate a basis function that separates
active mutants from inactive mutants based
on cube volumes.
49. In both histograms, it is clear that there is a
certain lower and upper bound that all active
mutants fall between.
If these thresholds can be detected, it is
possible to accurately predict inactive p53
mutants.
Using the entire intersection volume for this
purpose is not as accurate.
50. For each cube, we calculate the lower
threshold (minimum volume across all
mutations) and the higher threshold
(maximum volume across all mutations).
Mutation Cube 3 Cube 4 Cube 11
r158l_s227f_n239y 7.445915 10.34335 41.34778
r249m_n235k_n239y 5.358244 10.58604 37.30913
r273c_d281g_e285g 7.348197 10.26282 42.42635
Lower threshold 5.358244 10.26282 37.30913
Upper threshold 7.445915 10.58604 42.42635
51. The remaining data set is divided into two
parts: a second training set to determine
which cubes indicate inactivity, and a test set
to analyze the predictive power of the chosen
cubes.
For each mutation, 39 true/false values (one
for each cube) are computed. Each cube with
a volume above or below the thresholds set
for that particular cube in stage 1 gets a
“true” value.
52. Total Dataset
Dataset minus
training set
Training Set 1
(only actives)
Stage 1
Test Set Training Set 2
Thresholds generated
in Training Set 1 used
to extract inactivity
indicators from
Training Set 2
Stage 2
Inactivity indicators are
applied to the test set.
53. Active mutants have, on average, far fewer
“true” values (threshold violations) than
inactive mutants. Depending on the training
and test sets, the worst active mutants had
anywhere from 3-6 true values, while
inactives can have more than 13.
1 violation: 46
2 violations: 29
3 violations: 25
4 violations: 9
5 violations: 5
6 violations: 1
7 violations: 0
8 violations: 1
9 violations: 0
10 violations: 0
11 violations: 0
12 violations: 0
13 violations: 0
Num violators: 116
1 violation: 6
2 violations: 2
3 violations: 1
4 violations: 0
5 violations: 0
6 violations: 0
Num violators: 9
# of Active Mutants with n threshold violations # of Inactive Mutants with n threshold violations
54. A simple example: Mutations a-e have
violations in cubes 1-5 if they have a “TRUE”.
For each cube, the number of mutations with
true values in that cube and 0, 1, 2, etc other
violations is counted.
Mutation 1 2 3 4 5
A TRUE TRUE TRUE TRUE
B TRUE TRUE TRUE TRUE
C TRUE TRUE TRUE TRUE
D TRUE
E TRUE TRUE
55. Cube Number of
violators w/
0 other
violations
Number of
violators w/
1 other
violation
Number of
violators w/
2 other
violations
Number of
violators w/
3 other
violations
1 3 3 2 2
2 3 2 2 2
3 3 3 3 3
4 3 3 3 3
5 3 3 2 2
The cubes marked in red are the cubes of
interest. The amount of cubes with violations
in them does not change between 0 and 3
other threshold violations.
56. For each cube, the amount of mutations in
the second training set with a threshold
violation in that cube were counted.
Separate counts were generated for
mutations that had a violation in that cube
along with 1, 2, 3, etc other cubes.
57. If all active mutations have 3 or less threshold
violations, then cubes that sustain the same
count between 0 and 3 neighboring violations
are only present in inactive mutations.
These inactivity indicators are used to classify
the mutants in the test set as either active or
inactive.
Test Set Training Set 2
Inactivity indicators are
applied to the test set.
58. The algorithm searches for inactive mutants.
Any mutant with more than three threshold
violations is automatically counted as
inactive.
From the remaining pool, all mutants with
threshold violations in the cubes that do not
have their counts change between 0 and 3
concomitant violations are also counted as
inactive.
59. Cubes that guarantee inactivity have still not
been completely identified.
The best inactivity indicators generated thus
far are not the most sensitive.
Next steps involve either a refinement of the
threshold violations algorithm or the use of
other methods.
Hinweis der Redaktion
I chose this title because that’s the title Brian submitted to the NSF when getting funding. SAY THAT!
Say that the various components of the title will be explained as we go along.
ADD SARA GROGAN!!
Say in your presentation: “Don’t worry, I’ll explain what each one of these terms means.”
Cell division suspended in the G1/S phase.
Apoptosis initiated when cell sustains too much damage.
Active/inactive classification is binary. Did not consider partial loss of function mutations.
Changes to these amino acids can drastically alter the structure of the binding site.
R273h/n263v is a rescue mutation. The phenomenon of mutations in p53 counteracting each other has been described as rescue mutations, and is heavily researched. We just concern ourselves with the activity classification of the protein.
R273h/n263v is a rescue mutation. The phenomenon of mutations in p53 counteracting each other has been described as rescue mutations, and is heavily researched. We just concern ourselves with the activity classification of the protein.
R273h/n263v is a rescue mutation. The phenomenon of mutations in p53 counteracting each other has been described as rescue mutations, and is heavily researched. We just concern ourselves with the activity classification of the protein.
Because we only analyze substitutions of DNA binding AAs, we use 77 of the actives and 155 of the inactives
A protein data bank (pdb) file contains twofold information about proteins: the sequence of the protein’s amino acids and the 3D structure of the protein.
1TSR.pdb contains three chains, each a copy of the p53 DNA binding domain. Chain B interacts directly with the binding region of DNA.
What conformational changes to p53’s contact site determine activity vs. inactivity? Can we generalize our findings to any mutation within the DNA binding region?
An electrostatic isopotential maps all regions of a protein with a particular charge. A +1/-1 isopotential contains all regions with a charge of 1 kT/e or greater/less, where k = Boltzmann constant, T = Temperature, and e = electron charge.
The regions where the oppositely charged regions of the isopotential overlap are the regions of electrostatic complementarity, which govern binding.
An electrostatic isopotential maps all regions of a protein with a particular charge. A +1/-1 isopotential contains all regions with a charge of 1 kT/e or greater/less, where k = Boltzmann constant, T = Temperature, and e = electron charge.
The regions where the oppositely charged regions of the isopotential overlap are the regions of electrostatic complementarity, which govern binding.
Both of these programs were created by Prof. Brian Chen
We removed the -1 isopotential of p53 from the image and made the p53 +1 isopotential transparent
This picture is the visual representation of such a computation
Unfortunately, calculating the volume of the intersection region does not yield viable classification.
Explain every feature about this graph: x axis, y axis, blue vs red. For example: about ¼ of all active mutants have EC region volumes around 470 cubic angstroms. “Am I happy with this diagram? Of course not (transition to next slide).”
The choice of 5 cubic angstroms produced the clearest return values.
By looking at a small subset of the electrostatic complementarity region, we are able to get a much clearer division between active and inactive volumes, at least for some part of the histogram. Unfortunately the biggest part of the histogram is still tangled. What this shows is that by subdividing the EC region into subsection, we can improve our results. Hence the next approach.
39 cubic sub regions were generated. The volume of the electrostatic complementarity region contained in each cube was computed.
Say the following: “After that, the learning algorithm I have developed becomes complicated, so allow me to go to the results. If anyone is interested in the algorithm I am at your disposal to take it offline.”
There are far too many false positives. Obviously I am still not happy with these results. While the sensitivity is good, the specificity is not. There are far too many false positives.
In the graphs of both approaches, we notice the existence of very low volumes that only inactive mutants tend to have.
The Approach 1 graph indicates the existence of certain extraneous regions of complementarity that are detrimental to binding. In Approach 2 these regions are removed because we are only looking at the complementarity region needed for binding.
These maximum and minimum volumes determine the upper and lower volume thresholds, respectively, for each cube. These thresholds are used to in the remaining stages to classify mutations in the test set.