SlideShare ist ein Scribd-Unternehmen logo
1 von 59
By Jonah Kohen
Advised by Prof. Brian Chen
 1) Who are we and what do we do?
 2) DNA and p53: Basics
 3) Research problem
 4) Approach 1: electrostatic complementarity
across whole region.
 5) Approach 2: complementarity at the 14
binding amino acids.
 6) Approach 3: Cubes
 7) Conclusions
 Jonah Kohen – B.S. Computer Engineering
(expected May 2015), Lehigh University
 Sara Grogan – B.S. IBE Chemical Engineer,
minor Biotechnology (expected May 2017),
Lehigh University
 Prof. Brian Chen – P.C. Rossin Assistant
Professor of Computer Science and
Engineering at Lehigh University
Sara
Grogan
Brian Chen
 Prof. Chen works in structural bioinformatics
and has created programs modeling
biomolecule interactions to aid in
bioinformatics research.
 I chose to study p53 because it plays a pivotal
role in cancer.
 Electrostatic analysis of the interaction
between DNA and p53 mutants can be used
to predict whether or not the mutation will
lead to cancer.
 1) Who are we and what do we do?
 2) DNA and p53: Basics
 3) Research problem
 4) Approach 1: electrostatic complementarity
across whole region.
 5) Approach 2: complementarity at the 14
binding amino acids.
 6) Approach 3: Cubes
 7) Conclusions
 The DNA in the nucleus of cells contains the
instructions for the production of proteins.
p53 is one such protein.
 Proteins are large molecules composed of
amino acids that perform certain functions
like cellular metabolic processes.
 p53 performs its function by binding to a
particular region on the DNA.
 p53 becomes activated in response to cellular
stress, for example DNA damage.
 p53 in turn activates DNA repair proteins,
suspends cell division, and initiates apoptosis
(cell death).
 These damage control mechanisms help
prevent cancer.
Repair Suspend Division Cell Death
 When functioning normally, p53 suppresses
the proliferation of cancer cells.
 Mutations to p53 may hinder this function.
 Mutations in the p53 tumor suppressor are
the most frequently observed genetic
alterations in human cancer.
 Each of these variants may carry one or
several substitutions.
3 p53
proteins
interacting
with DNA.
 A p53 mutant is “active” if this variant of p53
is functioning normally.
 A p53 mutant is “inactive” if this variant of
p53 is unable to function normally.
Active: Good Inactive: Bad
 To design an algorithm that can reliably
classify a p53 mutant as active or inactive.
 Several predictors have been proposed, but
most are unreliable.
 A reliable predictor would help us diagnose a
mutation as cancerous or not.
 p53 is composed of 393 amino acids. The
region responsible for DNA binding is
between amino acids number 102-292.
 Within this region, I am looking only at 14
amino acids that directly bind to DNA.
 Previous research has found that these 14
amino acids are: a119, a276, n239, n247,
r248, r273, r280, r283, c275, c277, l120,
m243, s121, and s241.
l120 s121 n239 s241 r248 r273 c277 r283
 1) Who are we and what do we do?
 2) DNA and p53: Basics
 3) Research problem
 4) Approach 1: electrostatic complementarity
across whole region.
 5) Approach 2: complementarity at the 14
binding amino acids.
 6) Approach 3: Cubes
 7) Conclusions
 If the 273rd amino acid of p53 (arginine) is
changed to histidine (abbreviated r273h), p53
is inactivated.
r273h
 If the 273rd amino acid of p53 is changed to
histidine AND the 263rd amino acid of p53
(asparagine) is changed to valine (abbrev.
r273h/n263v), p53 remains functional.
r273h/n263v
 Why are mutants like r273h inactive while
other mutants like r273h/n263v active?
r273h r273h/n263v
???
 Data set of 541 pdb files, each one describing
a different p53 mutant.
 143 active mutants, 77 involving the 14 key
amino acids.
 398 inactive mutants, 155 involving the 14
key amino acids.
 Activity determined by in vivo analysis.
 Source: Richard H. Lathrop, UC Irvine.
pdb file of unmutated p53, viewed
in Pymol (3D structure).
=
Segment of same pdb file viewed
as text.
Leads to…
r273h/s240q
Observable changes in structural and electrostatic properties.
Mutations involving any of the 14 binding amino acids (one or more).
Which we want to use to…
Reliably classify p53 mutants as active or inactive.
 p53 and DNA are both primarily negatively
charged molecules. However, p53 has
positive pockets that interact with the
negatively charged DNA.
 The electrostatic complementarity region is
defined as a region where negatively charged
DNA overlaps with positively charged p53.
+1
+2
+3
-1
-2
-3
protein +1
isopotential
DNA -1
isopotential
 Compute +1 isopotential around p53 mutant,
-1 isopotential around DNA, and find the
electrostatic complementarity region between
the two.
 Isopotentials generated by “surfaceExtractor”,
Boolean intersections generated by VASP
(Volumetric Analysis of Surface Properties).
 The -1 isopotential region of the p53-
binding DNA motif.
DNA -1
isopotential,
yellow
indicates
negative
charge
Intersection of DNA and unmutated (wild type) p53
 1) Who are we and what do we do?
 2) DNA and p53: Basics
 3) Research problem
 4) Approach 1: electrostatic complementarity
across whole region.
 5) Approach 2: complementarity at the 14
binding amino acids.
 6) Approach 3: Cubes
 7) Conclusions
 The volume of the electrostatic
complementarity region between amino acids
102-292 is computed for every mutant.
This picture is the
visual representation
of a volumetric
computation.
0
1/20
1/10
3/20
1/5
1/4
3/10
360 380 400 420 440 460 480 500 520 540 560 580 600
Rel. Freq. active
Rel. Freq. Inactive
Complementarity Region Volume
Relative
Frequency
 Given a random complementarity region
volume, it is mostly impossible to determine
whether the mutant is active or inactive.
0
1/20
1/10
3/20
1/5
1/4
3/10
360
380
400
420
440
460
480
500
520
540
560
580
600
Rel. Freq. active
Rel. Freq. Inactive
Complementarity Region Volume
RelativeFrequency
Random mutant:
Volume = 380,
effect = inactive Random mutant:
Volume = 580,
effect = inactive
Random mutant:
Volume = 480,
effect = ????
 1) Who are we and what do we do?
 2) DNA and p53: Basics
 3) Research problem
 4) Approach 1: electrostatic complementarity
across whole region.
 5) Approach 2: complementarity at the 14
binding amino acids.
 6) Approach 3: Cubes
 7) Conclusions
 What happens if we only analyze the volume
near the 14 amino acids directly involved in
DNA binding?
 Again, these amino acids are: a119, a276,
n239, n247, r248, r273, r280, r283, c275,
c277, l120, m243, s121, and s241.
 From the entire electrostatic complementarity
region, we take all subregions that are within
five cubic Angstroms of any of the 14 binding
amino acids. The rest of the region is
discarded.
Electrostatic Complementarity
region for unmutated p53
within 5 cubic Angstroms of
the 13 binding amino acids
 Results of this test indicate that if the region
volume is below a certain threshold, it is
guaranteed inactive.
 These lower volumes correspond to real life
mutations of r248 and r283.
0
0.05
0.1
0.15
0.2
0.25
Rel. Freq. Active
Rel. Freq. Inactive
Complementarity Region Volume
Relative
Frequency
0
1/20
1/10
3/20
1/5
1/4
3/10
360
380
400
420
440
460
480
500
520
540
560
580
600
Rel. Freq. active
Rel. Freq. Inactive
Complementarity Region Volume
RelativeFrequency
0
0.05
0.1
0.15
0.2
0.25
Rel. Freq. Active
Rel. Freq. Inactive
Complementarity Region Volume
RelativeFrequencyApproach 1
Approach 2
 1) Who are we and what do we do?
 2) DNA and p53: Basics
 3) Research problem
 4) Approach 1: electrostatic complementarity
across whole region.
 5) Approach 2: complementarity at the 14
binding amino acids.
 6) Approach 3: Cubes
 7) Conclusions
 We divide the EC region into 39 sub regions
defined by cubes. We calculate the volume of
electrostatic complementarity in each cube.
Intersection region for wild
type p53 separated into
cubes
 Each mutation is an observation with 39
features. Each feature is the volume of the
electrostatic complementarity region
contained in a particular cube.
r273r 41.082862 78.972403 16.34071
Mutation
Name Cube 11 Cube 60 Cube 101
 The collection of all observations is a matrix
with 39 columns (cube volumes) and 232
rows (mutations).
Mutation Cube 3 Cube 4 Cube 11
r158l_s227f_n239y 7.445915 10.34335 41.34778
r249m_n235k_n239y 5.358244 10.58604 37.30913
r273c_d281g_e285g 7.348197 10.26282 42.42635
 A “true positive” is a correctly recognized
active mutant. A “true negative” is a correctly
recognized inactive mutant.
 1) Who are we and what do we do?
 2) DNA and p53: Basics
 3) Research problem
 4) Approach 1: electrostatic complementarity
across whole region.
 5) Approach 2: complementarity at the 14
binding amino acids.
 6) Approach 3: Cubes
 7) Conclusions
 Volumetric analysis of electrostatic
complementarity regions holds promising
applications to p53 study.
 A prediction algorithm with high sensitivity is
possible from analysis of cube data.
 Our hope is that, using more refined analysis
techniques, the prediction algorithm can be
made even more specific.
 Support Vector Machines (SVM) can be used
to calculate a basis function that separates
active mutants from inactive mutants based
on cube volumes.
0
1/10
1/5
3/10
360
380
400
420
440
460
480
500
520
540
560
580
600
Intersection Volumes+1 Relative
Frequency
Rel. Freq. active
Rel. Freq. Inactive
Complementarity Region Volume
RelativeFrequency
0
0.1
0.2
0.3
210
225
240
255
270
285
300
315
More
Binding Acid regions +1
Relative Frequency
Rel. Freq. Active
Rel. Freq. Inactive
Complementarity Region Volume
RelativeFrequency
 In both histograms, it is clear that there is a
certain lower and upper bound that all active
mutants fall between.
 If these thresholds can be detected, it is
possible to accurately predict inactive p53
mutants.
 Using the entire intersection volume for this
purpose is not as accurate.
 For each cube, we calculate the lower
threshold (minimum volume across all
mutations) and the higher threshold
(maximum volume across all mutations).
Mutation Cube 3 Cube 4 Cube 11
r158l_s227f_n239y 7.445915 10.34335 41.34778
r249m_n235k_n239y 5.358244 10.58604 37.30913
r273c_d281g_e285g 7.348197 10.26282 42.42635
Lower threshold 5.358244 10.26282 37.30913
Upper threshold 7.445915 10.58604 42.42635
 The remaining data set is divided into two
parts: a second training set to determine
which cubes indicate inactivity, and a test set
to analyze the predictive power of the chosen
cubes.
 For each mutation, 39 true/false values (one
for each cube) are computed. Each cube with
a volume above or below the thresholds set
for that particular cube in stage 1 gets a
“true” value.
Total Dataset
Dataset minus
training set
Training Set 1
(only actives)
Stage 1
Test Set Training Set 2
Thresholds generated
in Training Set 1 used
to extract inactivity
indicators from
Training Set 2
Stage 2
Inactivity indicators are
applied to the test set.
 Active mutants have, on average, far fewer
“true” values (threshold violations) than
inactive mutants. Depending on the training
and test sets, the worst active mutants had
anywhere from 3-6 true values, while
inactives can have more than 13.
1 violation: 46
2 violations: 29
3 violations: 25
4 violations: 9
5 violations: 5
6 violations: 1
7 violations: 0
8 violations: 1
9 violations: 0
10 violations: 0
11 violations: 0
12 violations: 0
13 violations: 0
Num violators: 116
1 violation: 6
2 violations: 2
3 violations: 1
4 violations: 0
5 violations: 0
6 violations: 0
Num violators: 9
# of Active Mutants with n threshold violations # of Inactive Mutants with n threshold violations
 A simple example: Mutations a-e have
violations in cubes 1-5 if they have a “TRUE”.
 For each cube, the number of mutations with
true values in that cube and 0, 1, 2, etc other
violations is counted.
Mutation 1 2 3 4 5
A TRUE TRUE TRUE TRUE
B TRUE TRUE TRUE TRUE
C TRUE TRUE TRUE TRUE
D TRUE
E TRUE TRUE
Cube Number of
violators w/
0 other
violations
Number of
violators w/
1 other
violation
Number of
violators w/
2 other
violations
Number of
violators w/
3 other
violations
1 3 3 2 2
2 3 2 2 2
3 3 3 3 3
4 3 3 3 3
5 3 3 2 2
 The cubes marked in red are the cubes of
interest. The amount of cubes with violations
in them does not change between 0 and 3
other threshold violations.
 For each cube, the amount of mutations in
the second training set with a threshold
violation in that cube were counted.
 Separate counts were generated for
mutations that had a violation in that cube
along with 1, 2, 3, etc other cubes.
 If all active mutations have 3 or less threshold
violations, then cubes that sustain the same
count between 0 and 3 neighboring violations
are only present in inactive mutations.
 These inactivity indicators are used to classify
the mutants in the test set as either active or
inactive.
Test Set Training Set 2
Inactivity indicators are
applied to the test set.
 The algorithm searches for inactive mutants.
Any mutant with more than three threshold
violations is automatically counted as
inactive.
 From the remaining pool, all mutants with
threshold violations in the cubes that do not
have their counts change between 0 and 3
concomitant violations are also counted as
inactive.
 Cubes that guarantee inactivity have still not
been completely identified.
 The best inactivity indicators generated thus
far are not the most sensitive.
 Next steps involve either a refinement of the
threshold violations algorithm or the use of
other methods.

Weitere ähnliche Inhalte

Was ist angesagt?

Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...
Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...
Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...CrimsonpublishersCancer
 
Ligation of DNA fragments Practical
Ligation of DNA fragments Practical Ligation of DNA fragments Practical
Ligation of DNA fragments Practical Sabahat Ali
 
Honors Symposium Spring2009
Honors Symposium Spring2009Honors Symposium Spring2009
Honors Symposium Spring2009sweetflutterbyes
 
University of Texas at Austin
University of Texas at AustinUniversity of Texas at Austin
University of Texas at Austinbutest
 
Ivan Sotelo Poster FINAL Ver
Ivan Sotelo Poster FINAL Ver Ivan Sotelo Poster FINAL Ver
Ivan Sotelo Poster FINAL Ver Ivan Sotelo
 
SPR for Aptamer-Based Molecular Interactions in Programmable Materials
SPR for Aptamer-Based Molecular Interactions in Programmable MaterialsSPR for Aptamer-Based Molecular Interactions in Programmable Materials
SPR for Aptamer-Based Molecular Interactions in Programmable MaterialsReichertSPR
 
Nat_Chem_Biol_GPR30_2006
Nat_Chem_Biol_GPR30_2006Nat_Chem_Biol_GPR30_2006
Nat_Chem_Biol_GPR30_2006Alex Kiselyov
 
The invention of sprycel from benchtop to bedside (Gengcheng Jack Yang)
The invention of sprycel from benchtop to bedside (Gengcheng Jack Yang)The invention of sprycel from benchtop to bedside (Gengcheng Jack Yang)
The invention of sprycel from benchtop to bedside (Gengcheng Jack Yang)Jack Gengcheng YANG
 
Expanding Surface Plasmon Resonance Capabilities with Reichert
Expanding Surface Plasmon Resonance Capabilities with ReichertExpanding Surface Plasmon Resonance Capabilities with Reichert
Expanding Surface Plasmon Resonance Capabilities with ReichertReichertSPR
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand upChris Southan
 
Population fitness and genetic load of
Population fitness and genetic load ofPopulation fitness and genetic load of
Population fitness and genetic load ofThanka Elango
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSteve Flynn
 
Molecular Biology Lab Poster
Molecular Biology Lab PosterMolecular Biology Lab Poster
Molecular Biology Lab PosterMuhammad Jalal
 

Was ist angesagt? (20)

Abhishek RBF final
Abhishek RBF finalAbhishek RBF final
Abhishek RBF final
 
Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...
Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...
Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...
 
SERMACSposterDr.V
SERMACSposterDr.VSERMACSposterDr.V
SERMACSposterDr.V
 
CCMB document
CCMB documentCCMB document
CCMB document
 
BCSRCv1.3
BCSRCv1.3BCSRCv1.3
BCSRCv1.3
 
Ligation of DNA fragments Practical
Ligation of DNA fragments Practical Ligation of DNA fragments Practical
Ligation of DNA fragments Practical
 
Honors Symposium Spring2009
Honors Symposium Spring2009Honors Symposium Spring2009
Honors Symposium Spring2009
 
University of Texas at Austin
University of Texas at AustinUniversity of Texas at Austin
University of Texas at Austin
 
Ivan Sotelo Poster FINAL Ver
Ivan Sotelo Poster FINAL Ver Ivan Sotelo Poster FINAL Ver
Ivan Sotelo Poster FINAL Ver
 
SPR for Aptamer-Based Molecular Interactions in Programmable Materials
SPR for Aptamer-Based Molecular Interactions in Programmable MaterialsSPR for Aptamer-Based Molecular Interactions in Programmable Materials
SPR for Aptamer-Based Molecular Interactions in Programmable Materials
 
LSD1 - bmc-paper
LSD1 - bmc-paperLSD1 - bmc-paper
LSD1 - bmc-paper
 
Nat_Chem_Biol_GPR30_2006
Nat_Chem_Biol_GPR30_2006Nat_Chem_Biol_GPR30_2006
Nat_Chem_Biol_GPR30_2006
 
The invention of sprycel from benchtop to bedside (Gengcheng Jack Yang)
The invention of sprycel from benchtop to bedside (Gengcheng Jack Yang)The invention of sprycel from benchtop to bedside (Gengcheng Jack Yang)
The invention of sprycel from benchtop to bedside (Gengcheng Jack Yang)
 
Caryne Stacia Neuropharm
Caryne Stacia NeuropharmCaryne Stacia Neuropharm
Caryne Stacia Neuropharm
 
Expanding Surface Plasmon Resonance Capabilities with Reichert
Expanding Surface Plasmon Resonance Capabilities with ReichertExpanding Surface Plasmon Resonance Capabilities with Reichert
Expanding Surface Plasmon Resonance Capabilities with Reichert
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Population fitness and genetic load of
Population fitness and genetic load ofPopulation fitness and genetic load of
Population fitness and genetic load of
 
Interaction between ligand and receptor
Interaction between ligand and receptorInteraction between ligand and receptor
Interaction between ligand and receptor
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInal
 
Molecular Biology Lab Poster
Molecular Biology Lab PosterMolecular Biology Lab Poster
Molecular Biology Lab Poster
 

Andere mochten auch

The Deep Impulses that Drives us Online by Dr.Mahboob Khan Phd
The Deep Impulses that Drives us Online by Dr.Mahboob Khan PhdThe Deep Impulses that Drives us Online by Dr.Mahboob Khan Phd
The Deep Impulses that Drives us Online by Dr.Mahboob Khan PhdHealthcare consultant
 
Panamá: la ruta por descubrir
Panamá: la ruta por descubrirPanamá: la ruta por descubrir
Panamá: la ruta por descubrirAlejandra2804
 
Magazine music evaluation
Magazine music evaluationMagazine music evaluation
Magazine music evaluationcassidy111
 
Complexity on rise from atoms to human beings to human civilization, a compl...
Complexity on rise  from atoms to human beings to human civilization, a compl...Complexity on rise  from atoms to human beings to human civilization, a compl...
Complexity on rise from atoms to human beings to human civilization, a compl...Healthcare consultant
 
Healthcare Business Ideas by Dr.Mahboob Khan Phd
Healthcare Business Ideas by Dr.Mahboob Khan PhdHealthcare Business Ideas by Dr.Mahboob Khan Phd
Healthcare Business Ideas by Dr.Mahboob Khan PhdHealthcare consultant
 
Postmodern Analysis
Postmodern AnalysisPostmodern Analysis
Postmodern Analysisnazminkalam
 
Συντήρηση καυστήρα, επισκευή καυστήρα, oil burner
Συντήρηση καυστήρα, επισκευή καυστήρα, oil burnerΣυντήρηση καυστήρα, επισκευή καυστήρα, oil burner
Συντήρηση καυστήρα, επισκευή καυστήρα, oil burnerΘανάσης Μιτζιφίρης
 
complete cv
complete cvcomplete cv
complete cvphutheho
 
5 Barriers that Block Salespeople from Hitting Quota
5 Barriers that Block Salespeople from Hitting Quota5 Barriers that Block Salespeople from Hitting Quota
5 Barriers that Block Salespeople from Hitting QuotaRalph Barsi
 

Andere mochten auch (9)

The Deep Impulses that Drives us Online by Dr.Mahboob Khan Phd
The Deep Impulses that Drives us Online by Dr.Mahboob Khan PhdThe Deep Impulses that Drives us Online by Dr.Mahboob Khan Phd
The Deep Impulses that Drives us Online by Dr.Mahboob Khan Phd
 
Panamá: la ruta por descubrir
Panamá: la ruta por descubrirPanamá: la ruta por descubrir
Panamá: la ruta por descubrir
 
Magazine music evaluation
Magazine music evaluationMagazine music evaluation
Magazine music evaluation
 
Complexity on rise from atoms to human beings to human civilization, a compl...
Complexity on rise  from atoms to human beings to human civilization, a compl...Complexity on rise  from atoms to human beings to human civilization, a compl...
Complexity on rise from atoms to human beings to human civilization, a compl...
 
Healthcare Business Ideas by Dr.Mahboob Khan Phd
Healthcare Business Ideas by Dr.Mahboob Khan PhdHealthcare Business Ideas by Dr.Mahboob Khan Phd
Healthcare Business Ideas by Dr.Mahboob Khan Phd
 
Postmodern Analysis
Postmodern AnalysisPostmodern Analysis
Postmodern Analysis
 
Συντήρηση καυστήρα, επισκευή καυστήρα, oil burner
Συντήρηση καυστήρα, επισκευή καυστήρα, oil burnerΣυντήρηση καυστήρα, επισκευή καυστήρα, oil burner
Συντήρηση καυστήρα, επισκευή καυστήρα, oil burner
 
complete cv
complete cvcomplete cv
complete cv
 
5 Barriers that Block Salespeople from Hitting Quota
5 Barriers that Block Salespeople from Hitting Quota5 Barriers that Block Salespeople from Hitting Quota
5 Barriers that Block Salespeople from Hitting Quota
 

Ähnlich wie P53_Final_Presentation

Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
ShRNA-specific regulation of FMNL2 expression in P19 cells
ShRNA-specific regulation of FMNL2 expression in P19 cellsShRNA-specific regulation of FMNL2 expression in P19 cells
ShRNA-specific regulation of FMNL2 expression in P19 cellsYousefLayyous
 
IDENTIFICATION OF PROTEIN BINDING SITE.docx
IDENTIFICATION OF PROTEIN BINDING SITE.docxIDENTIFICATION OF PROTEIN BINDING SITE.docx
IDENTIFICATION OF PROTEIN BINDING SITE.docxSNEHA AGRAWAL GUPTA
 
Structure Prediction of WDR13 and a study of its Interacting Partners
Structure Prediction of WDR13 and a study of its Interacting PartnersStructure Prediction of WDR13 and a study of its Interacting Partners
Structure Prediction of WDR13 and a study of its Interacting PartnersAshish Baghudana
 
Network analysis of cancer metabolism: A novel route to precision medicine
Network analysis of cancer metabolism: A novel route to precision medicineNetwork analysis of cancer metabolism: A novel route to precision medicine
Network analysis of cancer metabolism: A novel route to precision medicineVarshit Dusad
 
DNA structure 2.pptx molecular biology.pptx
DNA structure 2.pptx molecular biology.pptxDNA structure 2.pptx molecular biology.pptx
DNA structure 2.pptx molecular biology.pptxGiDMOh
 
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...Haley D. Norman
 
Poster_RosanaLopez_SULI_Summer2011_aug8_final_2
Poster_RosanaLopez_SULI_Summer2011_aug8_final_2Poster_RosanaLopez_SULI_Summer2011_aug8_final_2
Poster_RosanaLopez_SULI_Summer2011_aug8_final_2Rosana Lopez
 
PhD Poster - UKEMS Conference 2008
PhD Poster - UKEMS Conference 2008PhD Poster - UKEMS Conference 2008
PhD Poster - UKEMS Conference 2008Donna Johnson
 
Topological analysis of coexpression networks in neoplastic tissues (BITS2012...
Topological analysis of coexpression networks in neoplastic tissues (BITS2012...Topological analysis of coexpression networks in neoplastic tissues (BITS2012...
Topological analysis of coexpression networks in neoplastic tissues (BITS2012...Roberto Anglani
 
Modeling DNA unzipping in the presence of bound proteins
Modeling DNA unzipping in the presence of bound proteinsModeling DNA unzipping in the presence of bound proteins
Modeling DNA unzipping in the presence of bound proteinsguestb5dd5e
 
Biochemistry Question Bank
Biochemistry Question BankBiochemistry Question Bank
Biochemistry Question BankShivankan Kakkar
 
05102016CH597 –computitional biochemistryDue date 0513201.docx
05102016CH597 –computitional biochemistryDue date 0513201.docx05102016CH597 –computitional biochemistryDue date 0513201.docx
05102016CH597 –computitional biochemistryDue date 0513201.docxhoney725342
 
GENE MAPPING.pptx
GENE MAPPING.pptxGENE MAPPING.pptx
GENE MAPPING.pptxKajalPawade
 
Presentation july 28_2015
Presentation july 28_2015Presentation july 28_2015
Presentation july 28_2015gkoytiger
 

Ähnlich wie P53_Final_Presentation (20)

Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
ShRNA-specific regulation of FMNL2 expression in P19 cells
ShRNA-specific regulation of FMNL2 expression in P19 cellsShRNA-specific regulation of FMNL2 expression in P19 cells
ShRNA-specific regulation of FMNL2 expression in P19 cells
 
Gene Array Analyzer
Gene Array AnalyzerGene Array Analyzer
Gene Array Analyzer
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
 
IDENTIFICATION OF PROTEIN BINDING SITE.docx
IDENTIFICATION OF PROTEIN BINDING SITE.docxIDENTIFICATION OF PROTEIN BINDING SITE.docx
IDENTIFICATION OF PROTEIN BINDING SITE.docx
 
EcoR124I_PR
EcoR124I_PREcoR124I_PR
EcoR124I_PR
 
Structure Prediction of WDR13 and a study of its Interacting Partners
Structure Prediction of WDR13 and a study of its Interacting PartnersStructure Prediction of WDR13 and a study of its Interacting Partners
Structure Prediction of WDR13 and a study of its Interacting Partners
 
Network analysis of cancer metabolism: A novel route to precision medicine
Network analysis of cancer metabolism: A novel route to precision medicineNetwork analysis of cancer metabolism: A novel route to precision medicine
Network analysis of cancer metabolism: A novel route to precision medicine
 
DNA structure 2.pptx molecular biology.pptx
DNA structure 2.pptx molecular biology.pptxDNA structure 2.pptx molecular biology.pptx
DNA structure 2.pptx molecular biology.pptx
 
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
 
AR_ResearchProspectus
AR_ResearchProspectusAR_ResearchProspectus
AR_ResearchProspectus
 
Poster_RosanaLopez_SULI_Summer2011_aug8_final_2
Poster_RosanaLopez_SULI_Summer2011_aug8_final_2Poster_RosanaLopez_SULI_Summer2011_aug8_final_2
Poster_RosanaLopez_SULI_Summer2011_aug8_final_2
 
PhD Poster - UKEMS Conference 2008
PhD Poster - UKEMS Conference 2008PhD Poster - UKEMS Conference 2008
PhD Poster - UKEMS Conference 2008
 
Topological analysis of coexpression networks in neoplastic tissues (BITS2012...
Topological analysis of coexpression networks in neoplastic tissues (BITS2012...Topological analysis of coexpression networks in neoplastic tissues (BITS2012...
Topological analysis of coexpression networks in neoplastic tissues (BITS2012...
 
Modeling DNA unzipping in the presence of bound proteins
Modeling DNA unzipping in the presence of bound proteinsModeling DNA unzipping in the presence of bound proteins
Modeling DNA unzipping in the presence of bound proteins
 
GENE SEQUENCING
GENE SEQUENCINGGENE SEQUENCING
GENE SEQUENCING
 
Biochemistry Question Bank
Biochemistry Question BankBiochemistry Question Bank
Biochemistry Question Bank
 
05102016CH597 –computitional biochemistryDue date 0513201.docx
05102016CH597 –computitional biochemistryDue date 0513201.docx05102016CH597 –computitional biochemistryDue date 0513201.docx
05102016CH597 –computitional biochemistryDue date 0513201.docx
 
GENE MAPPING.pptx
GENE MAPPING.pptxGENE MAPPING.pptx
GENE MAPPING.pptx
 
Presentation july 28_2015
Presentation july 28_2015Presentation july 28_2015
Presentation july 28_2015
 

P53_Final_Presentation

  • 1. By Jonah Kohen Advised by Prof. Brian Chen
  • 2.  1) Who are we and what do we do?  2) DNA and p53: Basics  3) Research problem  4) Approach 1: electrostatic complementarity across whole region.  5) Approach 2: complementarity at the 14 binding amino acids.  6) Approach 3: Cubes  7) Conclusions
  • 3.  Jonah Kohen – B.S. Computer Engineering (expected May 2015), Lehigh University  Sara Grogan – B.S. IBE Chemical Engineer, minor Biotechnology (expected May 2017), Lehigh University  Prof. Brian Chen – P.C. Rossin Assistant Professor of Computer Science and Engineering at Lehigh University Sara Grogan Brian Chen
  • 4.  Prof. Chen works in structural bioinformatics and has created programs modeling biomolecule interactions to aid in bioinformatics research.  I chose to study p53 because it plays a pivotal role in cancer.
  • 5.  Electrostatic analysis of the interaction between DNA and p53 mutants can be used to predict whether or not the mutation will lead to cancer.
  • 6.  1) Who are we and what do we do?  2) DNA and p53: Basics  3) Research problem  4) Approach 1: electrostatic complementarity across whole region.  5) Approach 2: complementarity at the 14 binding amino acids.  6) Approach 3: Cubes  7) Conclusions
  • 7.  The DNA in the nucleus of cells contains the instructions for the production of proteins. p53 is one such protein.  Proteins are large molecules composed of amino acids that perform certain functions like cellular metabolic processes.
  • 8.  p53 performs its function by binding to a particular region on the DNA.
  • 9.  p53 becomes activated in response to cellular stress, for example DNA damage.  p53 in turn activates DNA repair proteins, suspends cell division, and initiates apoptosis (cell death).  These damage control mechanisms help prevent cancer. Repair Suspend Division Cell Death
  • 10.  When functioning normally, p53 suppresses the proliferation of cancer cells.  Mutations to p53 may hinder this function.  Mutations in the p53 tumor suppressor are the most frequently observed genetic alterations in human cancer.  Each of these variants may carry one or several substitutions. 3 p53 proteins interacting with DNA.
  • 11.  A p53 mutant is “active” if this variant of p53 is functioning normally.  A p53 mutant is “inactive” if this variant of p53 is unable to function normally. Active: Good Inactive: Bad
  • 12.  To design an algorithm that can reliably classify a p53 mutant as active or inactive.  Several predictors have been proposed, but most are unreliable.  A reliable predictor would help us diagnose a mutation as cancerous or not.
  • 13.  p53 is composed of 393 amino acids. The region responsible for DNA binding is between amino acids number 102-292.  Within this region, I am looking only at 14 amino acids that directly bind to DNA.
  • 14.  Previous research has found that these 14 amino acids are: a119, a276, n239, n247, r248, r273, r280, r283, c275, c277, l120, m243, s121, and s241. l120 s121 n239 s241 r248 r273 c277 r283
  • 15.  1) Who are we and what do we do?  2) DNA and p53: Basics  3) Research problem  4) Approach 1: electrostatic complementarity across whole region.  5) Approach 2: complementarity at the 14 binding amino acids.  6) Approach 3: Cubes  7) Conclusions
  • 16.  If the 273rd amino acid of p53 (arginine) is changed to histidine (abbreviated r273h), p53 is inactivated. r273h
  • 17.  If the 273rd amino acid of p53 is changed to histidine AND the 263rd amino acid of p53 (asparagine) is changed to valine (abbrev. r273h/n263v), p53 remains functional. r273h/n263v
  • 18.  Why are mutants like r273h inactive while other mutants like r273h/n263v active? r273h r273h/n263v ???
  • 19.  Data set of 541 pdb files, each one describing a different p53 mutant.  143 active mutants, 77 involving the 14 key amino acids.  398 inactive mutants, 155 involving the 14 key amino acids.  Activity determined by in vivo analysis.  Source: Richard H. Lathrop, UC Irvine.
  • 20. pdb file of unmutated p53, viewed in Pymol (3D structure). = Segment of same pdb file viewed as text.
  • 21. Leads to… r273h/s240q Observable changes in structural and electrostatic properties. Mutations involving any of the 14 binding amino acids (one or more). Which we want to use to… Reliably classify p53 mutants as active or inactive.
  • 22.  p53 and DNA are both primarily negatively charged molecules. However, p53 has positive pockets that interact with the negatively charged DNA.
  • 23.  The electrostatic complementarity region is defined as a region where negatively charged DNA overlaps with positively charged p53. +1 +2 +3 -1 -2 -3 protein +1 isopotential DNA -1 isopotential
  • 24.  Compute +1 isopotential around p53 mutant, -1 isopotential around DNA, and find the electrostatic complementarity region between the two.  Isopotentials generated by “surfaceExtractor”, Boolean intersections generated by VASP (Volumetric Analysis of Surface Properties).
  • 25.  The -1 isopotential region of the p53- binding DNA motif. DNA -1 isopotential, yellow indicates negative charge
  • 26.
  • 27. Intersection of DNA and unmutated (wild type) p53
  • 28.  1) Who are we and what do we do?  2) DNA and p53: Basics  3) Research problem  4) Approach 1: electrostatic complementarity across whole region.  5) Approach 2: complementarity at the 14 binding amino acids.  6) Approach 3: Cubes  7) Conclusions
  • 29.  The volume of the electrostatic complementarity region between amino acids 102-292 is computed for every mutant. This picture is the visual representation of a volumetric computation.
  • 30. 0 1/20 1/10 3/20 1/5 1/4 3/10 360 380 400 420 440 460 480 500 520 540 560 580 600 Rel. Freq. active Rel. Freq. Inactive Complementarity Region Volume Relative Frequency
  • 31.  Given a random complementarity region volume, it is mostly impossible to determine whether the mutant is active or inactive. 0 1/20 1/10 3/20 1/5 1/4 3/10 360 380 400 420 440 460 480 500 520 540 560 580 600 Rel. Freq. active Rel. Freq. Inactive Complementarity Region Volume RelativeFrequency Random mutant: Volume = 380, effect = inactive Random mutant: Volume = 580, effect = inactive Random mutant: Volume = 480, effect = ????
  • 32.  1) Who are we and what do we do?  2) DNA and p53: Basics  3) Research problem  4) Approach 1: electrostatic complementarity across whole region.  5) Approach 2: complementarity at the 14 binding amino acids.  6) Approach 3: Cubes  7) Conclusions
  • 33.  What happens if we only analyze the volume near the 14 amino acids directly involved in DNA binding?  Again, these amino acids are: a119, a276, n239, n247, r248, r273, r280, r283, c275, c277, l120, m243, s121, and s241.
  • 34.  From the entire electrostatic complementarity region, we take all subregions that are within five cubic Angstroms of any of the 14 binding amino acids. The rest of the region is discarded. Electrostatic Complementarity region for unmutated p53 within 5 cubic Angstroms of the 13 binding amino acids
  • 35.  Results of this test indicate that if the region volume is below a certain threshold, it is guaranteed inactive.  These lower volumes correspond to real life mutations of r248 and r283. 0 0.05 0.1 0.15 0.2 0.25 Rel. Freq. Active Rel. Freq. Inactive Complementarity Region Volume Relative Frequency
  • 36. 0 1/20 1/10 3/20 1/5 1/4 3/10 360 380 400 420 440 460 480 500 520 540 560 580 600 Rel. Freq. active Rel. Freq. Inactive Complementarity Region Volume RelativeFrequency 0 0.05 0.1 0.15 0.2 0.25 Rel. Freq. Active Rel. Freq. Inactive Complementarity Region Volume RelativeFrequencyApproach 1 Approach 2
  • 37.  1) Who are we and what do we do?  2) DNA and p53: Basics  3) Research problem  4) Approach 1: electrostatic complementarity across whole region.  5) Approach 2: complementarity at the 14 binding amino acids.  6) Approach 3: Cubes  7) Conclusions
  • 38.  We divide the EC region into 39 sub regions defined by cubes. We calculate the volume of electrostatic complementarity in each cube. Intersection region for wild type p53 separated into cubes
  • 39.  Each mutation is an observation with 39 features. Each feature is the volume of the electrostatic complementarity region contained in a particular cube. r273r 41.082862 78.972403 16.34071 Mutation Name Cube 11 Cube 60 Cube 101
  • 40.  The collection of all observations is a matrix with 39 columns (cube volumes) and 232 rows (mutations). Mutation Cube 3 Cube 4 Cube 11 r158l_s227f_n239y 7.445915 10.34335 41.34778 r249m_n235k_n239y 5.358244 10.58604 37.30913 r273c_d281g_e285g 7.348197 10.26282 42.42635
  • 41.  A “true positive” is a correctly recognized active mutant. A “true negative” is a correctly recognized inactive mutant.
  • 42.  1) Who are we and what do we do?  2) DNA and p53: Basics  3) Research problem  4) Approach 1: electrostatic complementarity across whole region.  5) Approach 2: complementarity at the 14 binding amino acids.  6) Approach 3: Cubes  7) Conclusions
  • 43.  Volumetric analysis of electrostatic complementarity regions holds promising applications to p53 study.  A prediction algorithm with high sensitivity is possible from analysis of cube data.  Our hope is that, using more refined analysis techniques, the prediction algorithm can be made even more specific.
  • 44.  Support Vector Machines (SVM) can be used to calculate a basis function that separates active mutants from inactive mutants based on cube volumes.
  • 45.
  • 46.
  • 47.
  • 48. 0 1/10 1/5 3/10 360 380 400 420 440 460 480 500 520 540 560 580 600 Intersection Volumes+1 Relative Frequency Rel. Freq. active Rel. Freq. Inactive Complementarity Region Volume RelativeFrequency 0 0.1 0.2 0.3 210 225 240 255 270 285 300 315 More Binding Acid regions +1 Relative Frequency Rel. Freq. Active Rel. Freq. Inactive Complementarity Region Volume RelativeFrequency
  • 49.  In both histograms, it is clear that there is a certain lower and upper bound that all active mutants fall between.  If these thresholds can be detected, it is possible to accurately predict inactive p53 mutants.  Using the entire intersection volume for this purpose is not as accurate.
  • 50.  For each cube, we calculate the lower threshold (minimum volume across all mutations) and the higher threshold (maximum volume across all mutations). Mutation Cube 3 Cube 4 Cube 11 r158l_s227f_n239y 7.445915 10.34335 41.34778 r249m_n235k_n239y 5.358244 10.58604 37.30913 r273c_d281g_e285g 7.348197 10.26282 42.42635 Lower threshold 5.358244 10.26282 37.30913 Upper threshold 7.445915 10.58604 42.42635
  • 51.  The remaining data set is divided into two parts: a second training set to determine which cubes indicate inactivity, and a test set to analyze the predictive power of the chosen cubes.  For each mutation, 39 true/false values (one for each cube) are computed. Each cube with a volume above or below the thresholds set for that particular cube in stage 1 gets a “true” value.
  • 52. Total Dataset Dataset minus training set Training Set 1 (only actives) Stage 1 Test Set Training Set 2 Thresholds generated in Training Set 1 used to extract inactivity indicators from Training Set 2 Stage 2 Inactivity indicators are applied to the test set.
  • 53.  Active mutants have, on average, far fewer “true” values (threshold violations) than inactive mutants. Depending on the training and test sets, the worst active mutants had anywhere from 3-6 true values, while inactives can have more than 13. 1 violation: 46 2 violations: 29 3 violations: 25 4 violations: 9 5 violations: 5 6 violations: 1 7 violations: 0 8 violations: 1 9 violations: 0 10 violations: 0 11 violations: 0 12 violations: 0 13 violations: 0 Num violators: 116 1 violation: 6 2 violations: 2 3 violations: 1 4 violations: 0 5 violations: 0 6 violations: 0 Num violators: 9 # of Active Mutants with n threshold violations # of Inactive Mutants with n threshold violations
  • 54.  A simple example: Mutations a-e have violations in cubes 1-5 if they have a “TRUE”.  For each cube, the number of mutations with true values in that cube and 0, 1, 2, etc other violations is counted. Mutation 1 2 3 4 5 A TRUE TRUE TRUE TRUE B TRUE TRUE TRUE TRUE C TRUE TRUE TRUE TRUE D TRUE E TRUE TRUE
  • 55. Cube Number of violators w/ 0 other violations Number of violators w/ 1 other violation Number of violators w/ 2 other violations Number of violators w/ 3 other violations 1 3 3 2 2 2 3 2 2 2 3 3 3 3 3 4 3 3 3 3 5 3 3 2 2  The cubes marked in red are the cubes of interest. The amount of cubes with violations in them does not change between 0 and 3 other threshold violations.
  • 56.  For each cube, the amount of mutations in the second training set with a threshold violation in that cube were counted.  Separate counts were generated for mutations that had a violation in that cube along with 1, 2, 3, etc other cubes.
  • 57.  If all active mutations have 3 or less threshold violations, then cubes that sustain the same count between 0 and 3 neighboring violations are only present in inactive mutations.  These inactivity indicators are used to classify the mutants in the test set as either active or inactive. Test Set Training Set 2 Inactivity indicators are applied to the test set.
  • 58.  The algorithm searches for inactive mutants. Any mutant with more than three threshold violations is automatically counted as inactive.  From the remaining pool, all mutants with threshold violations in the cubes that do not have their counts change between 0 and 3 concomitant violations are also counted as inactive.
  • 59.  Cubes that guarantee inactivity have still not been completely identified.  The best inactivity indicators generated thus far are not the most sensitive.  Next steps involve either a refinement of the threshold violations algorithm or the use of other methods.

Hinweis der Redaktion

  1. I chose this title because that’s the title Brian submitted to the NSF when getting funding. SAY THAT! Say that the various components of the title will be explained as we go along.
  2. ADD SARA GROGAN!!
  3. Say in your presentation: “Don’t worry, I’ll explain what each one of these terms means.”
  4. Cell division suspended in the G1/S phase. Apoptosis initiated when cell sustains too much damage.
  5. Active/inactive classification is binary. Did not consider partial loss of function mutations.
  6. Changes to these amino acids can drastically alter the structure of the binding site.
  7. R273h/n263v is a rescue mutation. The phenomenon of mutations in p53 counteracting each other has been described as rescue mutations, and is heavily researched. We just concern ourselves with the activity classification of the protein.
  8. R273h/n263v is a rescue mutation. The phenomenon of mutations in p53 counteracting each other has been described as rescue mutations, and is heavily researched. We just concern ourselves with the activity classification of the protein.
  9. R273h/n263v is a rescue mutation. The phenomenon of mutations in p53 counteracting each other has been described as rescue mutations, and is heavily researched. We just concern ourselves with the activity classification of the protein.
  10. Because we only analyze substitutions of DNA binding AAs, we use 77 of the actives and 155 of the inactives A protein data bank (pdb) file contains twofold information about proteins: the sequence of the protein’s amino acids and the 3D structure of the protein.
  11. 1TSR.pdb contains three chains, each a copy of the p53 DNA binding domain. Chain B interacts directly with the binding region of DNA.
  12. What conformational changes to p53’s contact site determine activity vs. inactivity? Can we generalize our findings to any mutation within the DNA binding region?
  13. An electrostatic isopotential maps all regions of a protein with a particular charge. A +1/-1 isopotential contains all regions with a charge of 1 kT/e or greater/less, where k = Boltzmann constant, T = Temperature, and e = electron charge. The regions where the oppositely charged regions of the isopotential overlap are the regions of electrostatic complementarity, which govern binding.
  14. An electrostatic isopotential maps all regions of a protein with a particular charge. A +1/-1 isopotential contains all regions with a charge of 1 kT/e or greater/less, where k = Boltzmann constant, T = Temperature, and e = electron charge. The regions where the oppositely charged regions of the isopotential overlap are the regions of electrostatic complementarity, which govern binding.
  15. Both of these programs were created by Prof. Brian Chen
  16. We removed the -1 isopotential of p53 from the image and made the p53 +1 isopotential transparent
  17. This picture is the visual representation of such a computation
  18. Unfortunately, calculating the volume of the intersection region does not yield viable classification. Explain every feature about this graph: x axis, y axis, blue vs red. For example: about ¼ of all active mutants have EC region volumes around 470 cubic angstroms. “Am I happy with this diagram? Of course not (transition to next slide).”
  19. The choice of 5 cubic angstroms produced the clearest return values.
  20. By looking at a small subset of the electrostatic complementarity region, we are able to get a much clearer division between active and inactive volumes, at least for some part of the histogram. Unfortunately the biggest part of the histogram is still tangled. What this shows is that by subdividing the EC region into subsection, we can improve our results. Hence the next approach.
  21. 39 cubic sub regions were generated. The volume of the electrostatic complementarity region contained in each cube was computed.
  22. Say the following: “After that, the learning algorithm I have developed becomes complicated, so allow me to go to the results. If anyone is interested in the algorithm I am at your disposal to take it offline.”
  23. There are far too many false positives. Obviously I am still not happy with these results. While the sensitivity is good, the specificity is not. There are far too many false positives.
  24. In the graphs of both approaches, we notice the existence of very low volumes that only inactive mutants tend to have. The Approach 1 graph indicates the existence of certain extraneous regions of complementarity that are detrimental to binding. In Approach 2 these regions are removed because we are only looking at the complementarity region needed for binding.
  25. These maximum and minimum volumes determine the upper and lower volume thresholds, respectively, for each cube. These thresholds are used to in the remaining stages to classify mutations in the test set.