1. Res Potentia as a route to
understanding function
and evolution of cellular networks
Adam Pah
NetSci
June 21, 2012
1
2. Where do we stand and how can we do better?
2
We are generating
biological data faster
than ever
3. Where do we stand and how can we do better?
But generating is only
one part, we still have
to convert that to
actual usable knowledge
2
Knowledge
We are generating
biological data faster
than ever
4. Where do we stand and how can we do better?
But generating is only
one part, we still have
to convert that to
actual usable knowledge
2
KnowledgeData
We are generating
biological data faster
than ever
5. Where do we stand and how can we do better?
But generating is only
one part, we still have
to convert that to
actual usable knowledge
2
KnowledgeData
Knowledge
We are generating
biological data faster
than ever
6. Why study metabolism?
3
• My goal is to create a generalizable framework for
understanding cellular networks
7. • I use metabolism because:
Why study metabolism?
3
• My goal is to create a generalizable framework for
understanding cellular networks
8. • I use metabolism because:
• The data fidelity, while not perfect, is far better
Why study metabolism?
3
• My goal is to create a generalizable framework for
understanding cellular networks
9. • I use metabolism because:
• The data fidelity, while not perfect, is far better
• We can use metabolism as a test case to help
develop an understanding of cellular networks
Why study metabolism?
3
• My goal is to create a generalizable framework for
understanding cellular networks
10. • I use metabolism because:
• The data fidelity, while not perfect, is far better
• We can use metabolism as a test case to help
develop an understanding of cellular networks
• There is also the ability to produce metabolites
or chemicals that are of interest
Why study metabolism?
3
• My goal is to create a generalizable framework for
understanding cellular networks
11. Metabolic networks are constructed from the Kyoto
Encyclopedia of Genes and Genomes database for
each organism where:
How do we construct a metabolic network
12. • Metabolites are connected if they are a part of
the main reaction pair
Metabolic networks are constructed from the Kyoto
Encyclopedia of Genes and Genomes database for
each organism where:
How do we construct a metabolic network
13. • Metabolites are connected if they are a part of
the main reaction pair
• Substrates are connected to Products only.
Metabolic networks are constructed from the Kyoto
Encyclopedia of Genes and Genomes database for
each organism where:
How do we construct a metabolic network
14. • Metabolites are connected if they are a part of
the main reaction pair
• Substrates are connected to Products only.
Metabolic networks are constructed from the Kyoto
Encyclopedia of Genes and Genomes database for
each organism where:
How do we construct a metabolic network
UDP-Glucose + H2
O + 2 NAD+
UDP-Glucuronate + 2 NADH + 2 H+
15. • Metabolites are connected if they are a part of
the main reaction pair
• Substrates are connected to Products only.
Metabolic networks are constructed from the Kyoto
Encyclopedia of Genes and Genomes database for
each organism where:
How do we construct a metabolic network
UDP-Glucose + H2
O + 2 NAD+
UDP-Glucuronate + 2 NADH + 2 H+
UDP-Glucose + H2
O + 2 NAD+
UDP-Glucuronate + 2 NADH + 2 H+
16. • Metabolites are connected if they are a part of
the main reaction pair
• Substrates are connected to Products only.
Metabolic networks are constructed from the Kyoto
Encyclopedia of Genes and Genomes database for
each organism where:
How do we construct a metabolic network
UDP-Glucose + H2
O + 2 NAD+
UDP-Glucuronate + 2 NADH + 2 H+
UDP-Glucose + H2
O + 2 NAD+
UDP-Glucuronate + 2 NADH + 2 H+
UDP-Glucose UDP-Glucuronate
2 NAD+ 2 NADH
19. How do we construct a framework
6
Methanococcus maripaludis
20. Escherichia coli Homo sapiensArabidopsis thaliana
How do we construct a framework
Current knowledge
of Realm of actuals
‘Res Extenta’
6
Methanococcus maripaludis
21. Escherichia coli Homo sapiensArabidopsis thaliana
How do we construct a framework
Current knowledge
of Realm of actuals
‘Res Extenta’
Realm of Possibles
‘Res Potentia’
6
Methanococcus maripaludis
30. How much of a need exists to correct databases?
10
In the course of 1 year for 979 organisms in the
Kyoto Encyclopedia of Genes and Genomes
Database:
31. • 88,000 metabolites have been added as
annotations
How much of a need exists to correct databases?
10
In the course of 1 year for 979 organisms in the
Kyoto Encyclopedia of Genes and Genomes
Database:
32. • 88,000 metabolites have been added as
annotations
• 31,000 metabolites that were annotated have
been removed
How much of a need exists to correct databases?
10
In the course of 1 year for 979 organisms in the
Kyoto Encyclopedia of Genes and Genomes
Database:
33. • 88,000 metabolites have been added as
annotations
• 31,000 metabolites that were annotated have
been removed
• Resulting in over 100 changes per organism
How much of a need exists to correct databases?
10
In the course of 1 year for 979 organisms in the
Kyoto Encyclopedia of Genes and Genomes
Database:
34. How can we make predictions?
11
For every reaction there is a set of enzyme sequences
that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
35. How can we make predictions?
11
For every reaction there is a set of enzyme sequences
that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
36. How can we make predictions?
11
For every reaction there is a set of enzyme sequences
that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Protein1
Organism1
Protein2
Organism1
Protein3
Organism1
Protein4
Organism1
Organism1
proteins
37. How can we make predictions?
11
For every reaction there is a set of enzyme sequences
that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Reaction1
(Annotated)
Protein1
Organism1
Protein2
Organism1
Protein3
Organism1
Protein4
Organism1
Organism1
proteins
Enzyme1
Organism1
Enzyme1
Organism2
Enzyme1
Organism3
Enzyme1
Organism4
Reaction1
enzymes
38. How can we make predictions?
12
Protein1
Organism1
Protein2
Organism1
Protein3
Organism1
Protein4
Organism1
Organism1
proteins
Enzyme1
Organism1
Enzyme1
Organism2
Enzyme1
Organism3
Enzyme1
Organism4
Reaction1
enzymes
39. How can we make predictions?
12
Protein1
Organism1
Protein2
Organism1
Protein3
Organism1
Protein4
Organism1
Organism1
proteins
Enzyme1
Organism1
Enzyme1
Organism2
Enzyme1
Organism3
Enzyme1
Organism4
Reaction1
enzymes
40. How can we make predictions?
12
Protein1
Organism1
Protein2
Organism1
Protein3
Organism1
Protein4
Organism1
Organism1
proteins
Enzyme1
Organism1
Enzyme1
Organism2
Enzyme1
Organism3
Enzyme1
Organism4
Reaction1
enzymes
Protein BLAST
for Enzyme Sequences
41. How can we make predictions?
13
For every reaction there is a set of enzyme sequences
that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Reaction1
(Annotated)
Protein1
Organism1
Protein2
Organism1
Protein3
Organism1
Protein4
Organism1
Organism1
proteins
Enzyme1
Organism1
Enzyme1
Organism2
Enzyme1
Organism3
Enzyme1
Organism4
Reaction1
enzymes
0.0
Match
E-values
10-3
10-4
5.0
10-2
42. How can we make predictions?
14
For every reaction there is a set of enzyme sequences
that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Protein1
Organism1
Protein2
Organism1
Protein3
Organism1
Protein4
Organism1
Organism1
proteins
Enzyme1
Organism1
Enzyme1
Organism2
Enzyme1
Organism3
Enzyme1
Organism4
Reaction1
enzymes
0.0
Match
E-values
10-3
10-4
5.0
10-2
0.0
0.2
0.4
0.6
0.8
1.0
Excellent
Matches
FractionofMatches
Poor
Matches
43. How can we make predictions?
14
For every reaction there is a set of enzyme sequences
that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Reaction1
(Annotated)
Reaction2
(Unannotated)
0.0
0.2
0.4
0.6
0.8
1.0
Excellent
Matches
FractionofMatches
Poor
Matches
44. How can we make predictions?
14
For every reaction there is a set of enzyme sequences
that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Reaction1
(Annotated)
Reaction2
(Unannotated)
0.0
0.2
0.4
0.6
0.8
1.0
Excellent
Matches
FractionofMatches
Poor
Matches
45. How can we make predictions?
15
For every reaction there is a set of enzyme sequences
that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Repeat this for all 3328
reactions using 5.94 million
enzyme sequences in 873
organisms
0.0
0.2
0.4
0.6
0.8
1.0
Excellent
Matches
FractionofMatches
Poor
Matches
49. • We have one starting dataset, metabolic networks
from KEGG 2009
How do we validate our results?
17
50. • We have our predicted networks and its changes to
this dataset (Predicted Changes)
• We have one starting dataset, metabolic networks
from KEGG 2009
How do we validate our results?
17
51. • We have our predicted networks and its changes to
this dataset (Predicted Changes)
• I also have the entire KEGG dataset for 2 years
following that date (KEGG Changes)
• We have one starting dataset, metabolic networks
from KEGG 2009
How do we validate our results?
17
52. • We have our predicted networks and its changes to
this dataset (Predicted Changes)
• I also have the entire KEGG dataset for 2 years
following that date (KEGG Changes)
• We can then compare how well each set of changes
does in correcting the networks
• We have one starting dataset, metabolic networks
from KEGG 2009
How do we validate our results?
17
53. • We have our predicted networks and its changes to
this dataset (Predicted Changes)
• I also have the entire KEGG dataset for 2 years
following that date (KEGG Changes)
• We can then compare how well each set of changes
does in correcting the networks
• Ideally the networks should make sense and be as
connected as reasonably possible
• We have one starting dataset, metabolic networks
from KEGG 2009
How do we validate our results?
17
54. Validate by promoting connectedness
18
We can test and see how the actual changes in
the database do at completing and filling in gaps
in the networks
55. Validate by promoting connectedness
18
We can test and see how the actual changes in
the database do at completing and filling in gaps
in the networks
56. Validate by promoting connectedness
18
Gap Size
0.00
0.02
0.04
0.06
0.08
0.10
0.12
FractionofGapsFilled
KEGG Changes
Random
1 2 3 4 5
Predicted Changes
We can test and see how the actual changes in
the database do at completing and filling in gaps
in the networks
57. Validate by promoting connectedness
18
Gap Size
0.00
0.02
0.04
0.06
0.08
0.10
0.12
FractionofGapsFilled
KEGG Changes
Random
1 2 3 4 5
Predicted Changes
We can test and see how the actual changes in
the database do at completing and filling in gaps
in the networks
58. Validate by promoting connectedness
18
Gap Size
0.00
0.02
0.04
0.06
0.08
0.10
0.12
FractionofGapsFilled
KEGG Changes
Random
1 2 3 4 5
Predicted Changes
We can test and see how the actual changes in
the database do at completing and filling in gaps
in the networks
59. Validate by promoting connectedness
18
Gap Size
0.00
0.02
0.04
0.06
0.08
0.10
0.12
FractionofGapsFilled
KEGG Changes
Random
1 2 3 4 5
Predicted Changes
We can test and see how the actual changes in
the database do at completing and filling in gaps
in the networks
60. Validate by promoting connectedness
19
We can test and see how the actual changes in
the database create gaps
61. Validate by promoting connectedness
19
We can test and see how the actual changes in
the database create gaps
62. Validate by promoting connectedness
19
We can test and see how the actual changes in
the database create gaps
63. Validate by promoting connectedness
19
We can test and see how the actual changes in
the database create gaps
-0.1 -0.06 -0.02 0.02 0.06 0.1
RPF Predicted
Deletions
KEGG 2011
Deletions
Relative fraction of removed reactions
that create additional components
64. Validate by promoting connectedness
19
We can test and see how the actual changes in
the database create gaps
-0.1 -0.06 -0.02 0.02 0.06 0.1
RPF Predicted
Deletions
KEGG 2011
Deletions
Relative fraction of removed reactions
that create additional components
65. Considering reactions in the context of the Res
Potentia enhances the ability to correct and close
gaps in organismal networks
What did we learn
20
66. Considering reactions in the context of the Res
Potentia enhances the ability to correct and close
gaps in organismal networks
What did we learn
20
Now we can begin to
analyze and understand
more complex features
of these networks
67. Acknowledgements
• Luis Amaral
• Irmak Sirer, Pat McMullen, Sam Seaver, Erin
Sawardecker
With financial support from:
• Northwestern/NIH Biotechnology Training Grant
• Chicago Biomedical Consortium