1. Large-scale structure of metabolic networks:
Role of Biochemical and Functional
constraints
Areejit Samal
Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and Univ
Paris-Sud, Orsay, France
and
Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
Areejit Samal
2. Outline
Architectural features of metabolic networks
Issues with existing null models for metabolic networks
Our method to generate realistic randomized metabolic
networks using Markov Chain Monte Carlo (MCMC)
sampling and Flux Balance Analysis (FBA)
Biological function is a main driver of large-scale structure in
metabolic networks
Areejit Samal
3. Architectural features of real-world networks
Small-world Scale-free
Watts and Strogatz, Nature (1998) Barabasi and Albert, Science (1999)
Small Average Path Length Power law degree distribution
& High Local Clustering
Scale-free graphs can be generated using a simple model incorporating:
Growth & Preferential attachment scheme
Areejit Samal
4. Architectural features of real-world networks
Small-world Scale-free
Watts and Strogatz, Nature (1998) Barabasi and Albert, Science (1999)
Small Average Path Length Power law degree distribution
& High Local Clustering
Scale-free graphs can be generated using a simple model incorporating:
Growth & Preferential attachment scheme
Areejit Samal
5. Large-scale structure of metabolic networks
Small-world, Scale-free and Hierarchical Organization
Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)
Small Average Path Length, High Local Clustering, Power-law degree distribution
Bow-tie architecture
Directed graph with a giant component of
strongly connected nodes along with associated
IN and OUT component
Bow-tie architecture of the metabolism is similar
to that found for the WWW by Broder et al.
Ma and Zeng, Bioinformatics (2003)
Areejit Samal
6. Large-scale structure of metabolic networks
Small-world, Scale-free and Hierarchical Organization
Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)
Small Average Path Length, High Local Clustering, Power-law degree distribution
Bow-tie architecture
Directed graph with a giant component of
strongly connected nodes along with associated
IN and OUT component
Bow-tie architecture of the metabolism is similar
to that found for the WWW by Broder et al.
Ma and Zeng, Bioinformatics (2003)
Areejit Samal
7. Network motifs and Evolutionary design
Sub-graphs that are over-
represented in real networks
compared to randomized
networks with same degree
sequence.
Certain network motifs were shown to
perform important information processing
tasks.
For example, the coherent Feed Forward
Loop (FFL) can be used to detect
persistent signals.
Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004)
Areejit Samal
8. Network motifs and Evolutionary design
Sub-graphs that are over-
represented in real networks
compared to randomized
networks with same degree
sequence.
Certain network motifs were shown to
perform important information processing
tasks.
For example, the coherent Feed Forward
Loop (FFL) can be used to detect
persistent signals.
Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004)
Areejit Samal
9. “Null hypothesis” based approach for identifying design
principles in real-world networks
Statistically significant network properties are
deduced as follows:
300
1) Measure the ‘chosen property’ (E.g., 250
Investigated
motif significance profile, average network
200
clustering coefficient, etc.) in the
Frequency
p-value
investigated real network. 150
100
2) Generate randomized networks with 50
structure similar to the investigated real
network using an appropriate null 0
0.60 0.62 0.64 0.66 0.68 0.70
model. Chosen Property
3) Use the distribution of the ‘chosen
property’ for the randomized networks
to estimate a p-value.
Areejit Samal
10. Edge-exchange algorithm: commonly-used null model
Randomized networks
preserve the degree at
each node and the degree
sequence.
Investigated Network
After 1 exchange
After 2 exchanges
Reference:
Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004); Maslov & Sneppen Science (2002)
Areejit Samal
11. Carefully posed null models are required for deducing
design principles with confidence
C. elegans Neuronal Network:
Close by neurons have a greater
chance of forming a connection
than distant neurons
A null model incorporating this
spatial constraint gives different
results compared to generic edge-
exchange algorithm.
Artzy-Randrup et al Science (2004)
Areejit Samal
12. Carefully posed null models are required for deducing
design principles with confidence
C. elegans Neuronal Network:
Close by neurons have a greater
chance of forming a connection
than distant neurons
A null model incorporating this
spatial constraint gives different
results compared to generic edge-
exchange algorithm.
Artzy-Randrup et al Science (2004)
Areejit Samal
13. Edge-exchange algorithm: case of bipartite metabolic
networks
asp-L cit
ASPT CITL
nh4 fum ac oaa
ASPT: asp-L → fum + nh4
CITL: cit → oaa + ac
Areejit Samal
14. Edge-exchange algorithm: case of bipartite metabolic
networks
cit asp-L cit Null-model used in
asp-L
many studies
including: Guimera &
Amaral Nature (2005);
CITL ASPT* CITL* Samal et al BMC
ASPT
Bioinformatics (2006)
ac oaa nh4 fum ac oaa
nh4 fum
ASPT: asp-L → fum + nh4 ASPT*: asp-L → ac + nh4
CITL: cit → oaa + ac CITL*: cit → oaa + fum
Preserves degree of each node in the network
But generates fictitious reactions that violate
mass, charge and atomic balance satisfied by
real chemical reactions!!
Note that fum has 4 carbon atoms while ac has
2 carbon atoms in the example shown.
Areejit Samal
15. Edge-exchange algorithm: case of bipartite metabolic
networks
cit asp-L cit Null-model used in
asp-L
many studies
including: Guimera &
Amaral Nature (2005);
CITL ASPT* CITL* Samal et al BMC
ASPT
Bioinformatics (2006)
ac oaa nh4 fum ac oaa
nh4 fum
ASPT: asp-L → fum + nh4 ASPT*: asp-L → ac + nh4
CITL: cit → oaa + ac CITL*: cit → oaa + fum
Preserves degree of each node in the network
But generates fictitious reactions that violate
mass, charge and atomic balance satisfied by
real chemical reactions!!
Note that fum has 4 carbon atoms while ac has
2 carbon atoms in the example shown.
Areejit Samal
16. Edge-exchange algorithm: case of bipartite metabolic
networks
cit asp-L cit Null-model used in
asp-L
many studies
including: Guimera &
Amaral Nature (2005);
CITL ASPT* CITL* Samal et al BMC
ASPT
Bioinformatics (2006)
ac oaa nh4 fum ac oaa
nh4 fum
ASPT: asp-L → fum + nh4 ASPT*: asp-L → ac + nh4
CITL: cit → oaa + ac CITL*: cit → oaa + fum
Preserves degree of each node in the network
But generates fictitious reactions that violate A common generic null model
mass, charge and atomic balance satisfied by (edge exchange algorithm) is
real chemical reactions!! unsuitable for different types
of real-world networks.
Note that fum has 4 carbon atoms while ac has
2 carbon atoms in the example shown.
Areejit Samal
17. Null models for metabolic networks: Blind-watchmaker network
Areejit Samal
18. Null models for metabolic networks: Blind-watchmaker network
Areejit Samal
19. Randomizing metabolic networks
We propose a new method using Markov Chain Monte Carlo (MCMC) sampling and
Flux Balance Analysis (FBA) to generate meaningful randomized ensembles for
metabolic networks by successively imposing biochemical and functional constraints.
Ensemble R
Increasing level of constraints
Given no. of valid Constraint I
biochemical reactions
+ Ensemble Rm Constraint II
Fix the no. of metabolites
+ Ensemble uRm Constraint III
Exclude Blocked Reactions
+ Ensemble uRm-V1
Functional constraint of
Constraint IV
growth on specified environments
Areejit Samal
20. Comparison with E. coli: Large-scale structural properties
of interest
In this work, we will generate randomized metabolic networks
using only reactions in KEGG and compare the structural
properties of randomized networks with those of E. coli.
We study the following large-scale structural properties:
• Metabolite Degree distribution
• Clustering Coefficient
• Average Path Length Scale Free and Small World
Ref: Jeong et al (2000) Wagner & Fell (2001)
• Pc: Probability that a path exists between two nodes in
the directed graph
• Largest strongly component (LSC) and the union of
LSC, IN and OUT components
Bow-tie architecture of the
E. coli metabolic network has m (~750) metabolites Internet and metabolism
and r (~900) reactions. Ref: Broder et al; Ma and Zeng, Bioinformatics (2003)
Areejit Samal
21. glc-D HEX1: atp + glc-D → adp + g6p + h
atp
PGI: g6p ↔ f6p
Different Graph-theoretic Representations of the
PFK: atp + f6p → adp + fdp + h
HEX1 Currency Other
Metabolite Metabolite
Irreversible Reversible
g6p Reaction
Metabolic Network
Reaction
PGI
glc-D
f6p
g6p
PFK
f6p
h fdp adp
fdp
(a) Bipartite Representation (b) Unipartite Representation
22. glc-D HEX1: atp + glc-D → adp + g6p + h
atp
PGI: g6p ↔ f6p
Different Graph-theoretic Representations of the
PFK: atp + f6p → adp + fdp + h
HEX1 Currency Other
Metabolite Metabolite
Degree Distribution
Irreversible Reversible
g6p Reaction
Metabolic Network
Reaction
PGI
glc-D
Clustering coefficient
Average Path Length
Largest Strong Component
f6p
g6p
PFK
f6p
h fdp adp
fdp
(a) Bipartite Representation (b) Unipartite Representation
23. glc-D HEX1: atp + glc-D → adp + g6p + h
atp
PGI: g6p ↔ f6p
Different Graph-theoretic Representations of the
PFK: atp + f6p → adp + fdp + h
HEX1 Currency Other
Metabolite Metabolite
Degree Distribution
Irreversible Reversible
g6p Reaction
Metabolic Network
Reaction
PGI
glc-D
Clustering coefficient
Average Path Length
Largest Strong Component
f6p
g6p
The values obtained for different network measures depend on the list
PFK
f6p
of currency metabolites used to generate the unipartite graph of
metabolites starting from the bipartite graph. In this work, we have
h
usedfdpwell defined biochemically sound list of currency metabolites to
a adp
fdp
construct unipartite graph corresponding to each metabolic network.
(a) Bipartite Representation (b) Unipartite Representation
24. Constraint I: Given number of valid biochemical
reactions
Instead of generating fictitious reactions using
edge exchange mechanism, we can restrict to
>6000 valid reactions contained within KEGG
database.
We use the database of 5870 mass balanced
reactions derived from KEGG by Rodrigues and
Wagner. For simplicity, I will refer to this database
in the talk as ‘KEGG’
Areejit Samal
25. Global Reaction Set
Nutrient metabolites Biomass composition
The set of external The E. coli biomass
metabolites in iJR904 composition of iJR904
were allowed for was used to determine
uptake/excretion in the + viability of genotypes.
global reaction set.
We can implement Flux
Balance Analysis (FBA)
within this database.
KEGG Database + E. coli iJR904
Areejit Samal
26. Genome-scale metabolic models: Flux Balance Analysis (FBA)
List of metabolic reactions with
stoichiometric coefficients
Growth rate for
Flux Balance the given medium
Biomass composition Analysis (FBA) ‘Biomass Yield’
Fluxes of all
reactions
Set of nutrients in the
environment
Advantages
FBA does not require enzyme kinetic information which is not known for most reactions.
Disadvantages
FBA cannot predict internal metabolite concentrations and is restricted to steady states.
Basic models do not account for metabolic regulation.
Reference: Varma and Palsson, Biotechnology (1994); Price et al Nature Reviews Microbiology (2004)
Areejit Samal
27. Bit string representation of metabolic networks within the global
reaction set
KEGG Database E. coli
R1: 3pg + nad → 3php + h + nadh 1
Present - 1
R2: 6pgc + nadp → co2 + nadph + ru5p-D 0
R3: 6pgc → 2ddg6p + h2o 1 Absent - 0
. .
. .
. .
6000
C900
RN: --------------------------------- .
Contains r reactions
N = 5870 reactions within KEGG (1’s in the bit string)
(or global reaction set)
The E. coli metabolic network or any random network of reactions within KEGG can be
represented as a bit string of length N with exactly r entries equal to 1.
r is the number of reactions in the E. coli metabolic network.
Areejit Samal
28. Ensemble R: Random networks within KEGG with same number of
reactions as E. coli
Random subsets or networks with r reactions
To compare with E. coli, we generate an
ensemble R of random networks as
KEGG with
N=5870 follows:
reactions
Pick random subsets with exactly r (~900)
reactions within KEGG database of N
(~6000) reactions.
r is the no. of reactions in E. coli.
1 0 1 1
0 0 1 0
0 1 1 1 Huge no. of networks possible within
. . . . KEGG with exactly r reactions
. . . . 6000
~ C900
. . . .
. . . .
Fixing the no. of reactions is equivalent to
Each random network or bit string fixing genome size.
contains exactly r reactions (1’s in
the string) like in E. coli.
Areejit Samal
29. Metabolite degree distribution
A degree of a metabolite is the number of reactions in which the metabolite participates in the
network.
100
10-1
10-2
P(k)
10-3
10-4
E. coli
10-5
10-6
1 10 100 1000
k
E. coli: = 2.17;
Most organisms: is 2.1-2.3
Reference:
Jeong et al (2000);
Wagner and Fell (2001)
Areejit Samal
30. Metabolite degree distribution
A degree of a metabolite is the number of reactions in which the metabolite participates in the
network.
100
100
10-1
10-1
10-2
10-2
P(k)
P(k)
10-3
10-3
10-4
10-4
E. coli
E. coli
10-5 Random
10-5
10-6
10-6
1 10 100 1000
1 10 100 1000
k
k
E. coli: = 2.17; Random networks in
Most organisms: is 2.1-2.3 ensemble R: = 2.51
Reference:
Jeong et al (2000);
Wagner and Fell (2001)
Areejit Samal
31. Metabolite degree distribution
A degree of a metabolite is the number of reactions in which the metabolite participates in the
network.
100 100
100
10-1
-1 10-1
10
10-2
10-2
10-2 10-3
P(k)
P(k)
P(k)
10-3 10-4
10-3
10-5
10-4
10-4
E. coli
10-6 E. coli
E. coli Random
10-5 Random
10-5 10-7 KEGG
10-6 10-8
10-6
1 10 100 1000 1 10 100 1000
1 10 100 1000
k k
k
E. coli: = 2.17; Random networks in Complete KEGG:
Most organisms: is 2.1-2.3 ensemble R: = 2.51 = 2.31
Reference:
Jeong et al (2000);
Wagner and Fell (2001)
Any (large enough) random subset of reactions in KEGG has Power Law degree
distribution !!
Areejit Samal
32. Reaction Degree Distribution
The degree of a reaction is the number of metabolites that participate in it.
3000
0 Currency
1 Currency
2500 2 Currency
3 Currency atp adp
4 Currency
5 Currency
2000
6 Currency R
Frequency
7 Currency
1500 A B
1000 In each reaction, we have counted the number of
currency and other metabolites which is shown in
500 the figure.
Most reactions involve 4 metabolites of which 2
0
0 2 4 6 8 10 12
are currency metabolites and 2 are other
Number of substrates in a reaction
metabolites.
The reaction degree distribution is bell shaped with typical reaction in the network
involving 4 metabolites.
The nature of reaction degree distribution is very different from the metabolite
degree distribution which follows a power law.
Areejit Samal
33. Scale-free versus Scale-rich network:
Origin of Power laws in metabolic networks
Tanaka and Doyle have suggested a classification of metabolites into three
categories based on their biochemical roles
(a) ‘Carriers’ (very high degree) atp adp
(b) ‘Precursors’ (intermediate degree)
R
(c) ‘Others’ (low degree)
A B
It has been argued that gene duplication
and divergence at the level of protein
interaction networks can lead to
observed power laws in metabolic
networks. However, the presence of
ubiquitous (high degree) ‘currency’
metabolites along with low degree
‘other’ metabolites in each reaction can
explain the observed fat tail in the
metabolite distribution.
Reference: Tanaka (2005); Tanaka, Csete, and Doyle (2008)
Areejit Samal
34. Network Properties: Given number of valid biochemical
reactions
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
35. Network Properties: Given number of valid biochemical
reactions
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
36. Constraint II: Fix the number of metabolites
Direct sampling of randomized networks with same no. of reactions and metabolites as in
E. coli is not possible.
Markov Chain Monte Carlo (MCMC) sampling of randomized networks
KEGG Database E. coli Step 1 Step 2 Ensemble RM
R1: 3pg + nad → 3php + h + nadh 1 1 1 1 0
0 Swap 1 Swap 0 106 Swaps 1 10 Swaps
3
0
R2: 6pgc + nadp → co2 + nadph + ru5p-D
R3: 6pgc → 2ddg6p + h2o 1 0 1 1 1
. 0 0 1 0 0
Accept
. 1 1 0 0 1
. . . Erase . Saving .
. memory of
RN: --------------------------------- . . . . Frequency .
initial
N = 5870 reactions within KEGG network
reaction universe Reject Reject
store store
Reaction swap keeps the number of reactions in network constant
Accept/Reject Criterion of reaction swap:
Accept if the no. of metabolites in the new network is less than or equal to E. coli.
Areejit Samal
37. Network Properties: After fixing number of metabolites
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
38. Network Properties: After fixing number of metabolites
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
39. Constraint III: Exclude Blocked Reactions
Large-scale metabolic networks typically have certain
dead-end or blocked reactions which cannot contribute to
KEGG
growth of the organism.
Unblocked
KEGG Blocked reactions have zero flux for every investigated
chemical environment under any steady-state condition and
do not contribute to biomass growth flux.
We next exclude blocked reactions from our biochemical
universe used for sampling randomized networks.
For Blocked reactions, see Burgard et al Genome Research (2004)
Areejit Samal
40. MCMC Sampling: Exclusion of blocked reactions
Ensemble uRM
KEGG Database E. coli Step 1 Step 2
R1: 3pg + nad → 3php + h + nadh 1 1 1 1 0
0 Swap 1 Swap 0 106 Swaps 1 10 Swaps
3
0
R2: 6pgc + nadp → co2 + nadph + ru5p-D
R3: 6pgc → 2ddg6p + h2o 1 0 1 1 1
. 0 0 1 0 0
Accept Erase Saving
. 1 1 0 0 1
. . . memory of . Frequency .
. initial
RN: --------------------------------- . . . network
. .
N = 5870 reactions within KEGG
reaction universe Reject Reject
store store
Accept/Reject Criterion of reaction swap:
Restrict the reaction swaps to the set of unblocked reactions within KEGG.
Accept if
the no. of metabolites in the new network is less than or equal to E. coli.
Areejit Samal
41. Network Properties: After excluding blocked reactions
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
42. Network Properties: After excluding blocked reactions
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
43. Constraint IV: Growth Phenotype
Definition of Phenotype
Genotypes Ability to synthesize Viability
biomass components
0
1
1
. Viable
genotype
.
. Flux Growth
. GA Balance
0 Analysis
No Growth
0 (FBA)
1 Unviable
. genotype
x
.
. Deleted
. GB reaction
Bit string and equivalent network
representation of genotypes
For FBA, see Price et al (2004) for a review
Areejit Samal
44. MCMC Sampling: Growth phenotype in one environment
Ensemble uRM-V1
KEGG Database E. coli Step 1 Step 2
R1: 3pg + nad → 3php + h + nadh 1 1 1 1 0
0 Swap 1 Swap 0 106 Swaps 1 10 Swaps
3
0
R2: 6pgc + nadp → co2 + nadph + ru5p-D
R3: 6pgc → 2ddg6p + h2o 1 0 1 1 1
. 0 0 1 0 0
Accept
. 1 1 0 Erase 0 Saving 1
. . . . memory of . Frequency .
RN: --------------------------------- . . . initial . .
network
N = 5870 reactions within KEGG
reaction universe Reject Reject
store store
Accept/Reject Criterion of reaction swap:
Restrict the reaction swaps to the set of unblocked reactions within KEGG.
Accept if
(a) No. of metabolites in the new network is less than or equal to E. coli
(b) New network is able to grow on the specified environment
Areejit Samal
45. Network Properties: Growth phenotype in one
environment
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
46. Network Properties: Growth phenotype in one
environment
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
47. Network Properties: Growth phenotype in multiple
environments
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
48. Network Properties: Growth phenotype in multiple
environments
10
0.10
8
0.08
6
<L>
<C>
0.06
4
0.04
2
0.02
0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT
0.6 0.8
Fraction of nodes
0.6
Pc
0.4
0.4
0.2
0.2
0.0 0.0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli
Probability that a path exists Largest Strong Component
between two nodes
Areejit Samal
49. Global structural properties of real metabolic networks:
Consequence of simple biochemical and functional constraints
Increasing level of constraints
Conjectured by:
A. Wagner (2007) and
B. Papp et al. (2008)
Samal and Martin, PLoS ONE (2011) 6(7): e22295
Areejit Samal
50. Global structural properties of real metabolic networks:
Consequence of simple biochemical and functional constraints
Increasing level of constraints
Conjectured by:
A. Wagner (2007) and
B. Papp et al. (2008)
Samal and Martin, PLoS ONE (2011) 6(7): e22295
Areejit Samal
51. High level of genetic diversity in our randomized ensembles
Any two random networks in our most constrained ensemble uRM-v10 differ in
~ 60% of their reactions.
1 1 1 1
0 0 0 0
1 0 1 0
. . . versus .
versus
. . . .
. . . .
. . . .
E. coli Random network in Random network in Random network in
ensemble uRM-V10 ensemble uRM-V10 ensemble uRM-V10
Hamming distance between the two networks is ~ 60% of the maximum
possible between two bit strings.
There is a set of ~ 100 reactions which are present in all of our random
networks.
Areejit Samal
52. Necessity of MCMC sampling
Ensemble uRm-V10
Ensemble uRm-V5
Ensemble uRm-V1
Ensemble uRm
Ensemble Rm
Ensemble R
Embedded sets like Russian dolls
100
Fraction of genotypes that are viable
10-5
10-10
Including the viability constraint on the first
10-15
Glucose
chemical environment leads to a reduction by at 10-20
Acetate
Succinate
Analytic prefactor
least a factor 1022
2000 2200 2400 2600 2800
Reference: Samal et al, BMC Systems Biology (2010) Number of reactions in genotype
Areejit Samal
53. Summary
We have proposed a new method based
on MCMC sampling and FBA to
generate randomized metabolic
networks by successively imposing few
macroscopic biochemical and functional
constraints into our sampling criteria.
By imposing a few macroscopic
constraints into our sampling criteria, we
were able to approach the properties of
real metabolic networks.
We conclude that biological function is a
main driver of structure in metabolic
networks.
Areejit Samal
54. Limitations and Future Outlook
We can extend the study using the SEED database to many organisms.
Automated reconstruction of
metabolic models for >140 organisms
on which FBA can be performed.
The list of reactions in KEGG is incomplete and its use may reflect evolutionary
constraints associated with natural organisms.
From a synthetic biology context, there could be many more reactions that can
occur which are not in KEGG.
Alternate approach proposed by Basler et al, Bioinformatics, 2011 which
generates new mass-balanced reactions but does not impose functionality
constraint.
Generate new mass-balanced
reactions but no constraint of
biomass production.
Areejit Samal
55. Genotype networks: A many-to-one genotype to phenotype map
RNA Sequence Protein Sequence Gene network
Genotype
folding
Phenotype
Secondary structure Three dimensional Gene expression
of pairings structure levels
Reference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner (2007)
Genotype network refers to the set of (many) genotypes that have a given phenotype,
and their “organization” in genotype space. In the context of RNA, genotype networks
are commonly referred to as Neutral networks.
Areejit Samal
56. Properties of the genotype-to-phenotype map
Are there many genotypes that have a given
phenotype?
Do the genotypes with a given phenotype form a
significant fraction of all possible genotypes?
Are all the genotypes with a given phenotype rather
similar?
Are biological genotypes atypically robust
compared to random genotypes with similar
phenotype?
Are biological genotypes atypically innovative?
Can genotypes change gradually in response to
selective pressures?
A. Wagner, Nat. Rev. Genet. (2008)
Areejit Samal
57. Properties of the genotype-to-phenotype map
Are there many genotypes that have a given
phenotype?
Do the genotypes with a given phenotype form a
significant fraction of all possible genotypes?
We have considered these questions in the context of metabolic
Are all the genotypes with a given phenotype rather
similar? networks.
Are biological genotypes atypically robust enzymes or reactions
Genotype = lists of metabolic
compared to random genotypes with similar in a given environment
Phenotype = growth or not
phenotype?
Are biological genotypes atypically innovative?
Can genotypes change gradually in response to
selective pressures?
A. Wagner, Nat. Rev. Genet. (2008)
Areejit Samal
59. Acknowledgement
Reference
Olivier C. Martin
Université Paris-Sud
Other Work
Andreas Wagner João Rodrigues Jürgen Jost
University of Zurich Max Planck Institute for Mathematics in the Sciences
Genotype networks in metabolic reaction spaces
Areejit Samal, João F Matias Rodrigues, Jürgen Jost, Olivier C Martin* and Andreas Wagner*
BMC Systems Biology 2010, 4:30
Environmental versatility promotes modularity in genome-scale metabolic networks
Areejit Samal, Andreas Wagner* and Olivier C Martin*
BMC Systems Biology 2011, 5:135
Funding
CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG
GDRE 513) for a Postdoctoral Fellowship.
Areejit Samal