Areejit Samal Randomizing metabolic networks

Large-scale structure of metabolic networks:
Role of Biochemical and Functional
constraints

Areejit Samal
Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and Univ
Paris-Sud, Orsay, France
and
Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany

Areejit Samal

Outline

 Architectural features of metabolic networks

 Issues with existing null models for metabolic networks

 Our method to generate realistic randomized metabolic
networks using Markov Chain Monte Carlo (MCMC)
sampling and Flux Balance Analysis (FBA)

 Biological function is a main driver of large-scale structure in
metabolic networks

Areejit Samal

Architectural features of real-world networks

Small-world Scale-free

Watts and Strogatz, Nature (1998) Barabasi and Albert, Science (1999)
Small Average Path Length Power law degree distribution
& High Local Clustering

Scale-free graphs can be generated using a simple model incorporating:
Growth & Preferential attachment scheme

Areejit Samal

Large-scale structure of metabolic networks

Small-world, Scale-free and Hierarchical Organization

Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)
Small Average Path Length, High Local Clustering, Power-law degree distribution

Bow-tie architecture

Directed graph with a giant component of
strongly connected nodes along with associated
IN and OUT component

Bow-tie architecture of the metabolism is similar
to that found for the WWW by Broder et al.

Ma and Zeng, Bioinformatics (2003)

Areejit Samal

Network motifs and Evolutionary design

Sub-graphs that are over-
represented in real networks
compared to randomized
networks with same degree
sequence.

Certain network motifs were shown to
perform important information processing
tasks.

For example, the coherent Feed Forward
Loop (FFL) can be used to detect
persistent signals.

Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004)

Areejit Samal

“Null hypothesis” based approach for identifying design
principles in real-world networks

Statistically significant network properties are
deduced as follows:
300

1) Measure the ‘chosen property’ (E.g., 250
Investigated
motif significance profile, average network
200

clustering coefficient, etc.) in the

Frequency
p-value
investigated real network. 150

100

2) Generate randomized networks with 50

structure similar to the investigated real
network using an appropriate null 0
0.60 0.62 0.64 0.66 0.68 0.70

model. Chosen Property

3) Use the distribution of the ‘chosen
property’ for the randomized networks
to estimate a p-value.

Areejit Samal

Edge-exchange algorithm: commonly-used null model

Randomized networks
preserve the degree at
each node and the degree
sequence.

Investigated Network

After 1 exchange

After 2 exchanges

Reference:
Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004); Maslov & Sneppen Science (2002)

Areejit Samal

Carefully posed null models are required for deducing
design principles with confidence

C. elegans Neuronal Network:
Close by neurons have a greater
chance of forming a connection
than distant neurons

A null model incorporating this
spatial constraint gives different
results compared to generic edge-
exchange algorithm.

Artzy-Randrup et al Science (2004)

Areejit Samal

Edge-exchange algorithm: case of bipartite metabolic
networks

asp-L cit

ASPT CITL

nh4 fum ac oaa

ASPT: asp-L → fum + nh4
CITL: cit → oaa + ac

Areejit Samal

networks
cit asp-L cit Null-model used in
asp-L
many studies
including: Guimera &
Amaral Nature (2005);
CITL ASPT* CITL* Samal et al BMC
ASPT
Bioinformatics (2006)

ac oaa nh4 fum ac oaa
nh4 fum

ASPT: asp-L → fum + nh4 ASPT*: asp-L → ac + nh4
CITL: cit → oaa + ac CITL*: cit → oaa + fum

Preserves degree of each node in the network
But generates fictitious reactions that violate
mass, charge and atomic balance satisfied by
real chemical reactions!!
Note that fum has 4 carbon atoms while ac has
2 carbon atoms in the example shown.

Areejit Samal

networks
cit asp-L cit Null-model used in
asp-L
many studies
including: Guimera &
Amaral Nature (2005);
CITL ASPT* CITL* Samal et al BMC
ASPT
Bioinformatics (2006)

ac oaa nh4 fum ac oaa
nh4 fum

ASPT: asp-L → fum + nh4 ASPT*: asp-L → ac + nh4
CITL: cit → oaa + ac CITL*: cit → oaa + fum

Preserves degree of each node in the network
But generates fictitious reactions that violate A common generic null model
mass, charge and atomic balance satisfied by (edge exchange algorithm) is
real chemical reactions!! unsuitable for different types
of real-world networks.
Note that fum has 4 carbon atoms while ac has
2 carbon atoms in the example shown.

Areejit Samal

Null models for metabolic networks: Blind-watchmaker network

Areejit Samal

Randomizing metabolic networks

We propose a new method using Markov Chain Monte Carlo (MCMC) sampling and
Flux Balance Analysis (FBA) to generate meaningful randomized ensembles for
metabolic networks by successively imposing biochemical and functional constraints.
Ensemble R

Increasing level of constraints
Given no. of valid Constraint I
biochemical reactions

+ Ensemble Rm Constraint II
Fix the no. of metabolites

+ Ensemble uRm Constraint III
Exclude Blocked Reactions

+ Ensemble uRm-V1
Functional constraint of
Constraint IV

growth on specified environments

Areejit Samal

Comparison with E. coli: Large-scale structural properties
of interest
In this work, we will generate randomized metabolic networks
using only reactions in KEGG and compare the structural
properties of randomized networks with those of E. coli.

We study the following large-scale structural properties:

• Metabolite Degree distribution

• Clustering Coefficient

• Average Path Length Scale Free and Small World
Ref: Jeong et al (2000) Wagner & Fell (2001)

• Pc: Probability that a path exists between two nodes in
the directed graph

• Largest strongly component (LSC) and the union of
LSC, IN and OUT components

Bow-tie architecture of the
E. coli metabolic network has m (~750) metabolites Internet and metabolism
and r (~900) reactions. Ref: Broder et al; Ma and Zeng, Bioinformatics (2003)

Areejit Samal

glc-D HEX1: atp + glc-D → adp + g6p + h
atp
PGI: g6p ↔ f6p
Different Graph-theoretic Representations of the
PFK: atp + f6p → adp + fdp + h

HEX1 Currency Other
Metabolite Metabolite

Irreversible Reversible
g6p Reaction
Metabolic Network

Reaction

PGI
glc-D

f6p
g6p

PFK
f6p

h fdp adp
fdp

(a) Bipartite Representation (b) Unipartite Representation

atp
PGI: g6p ↔ f6p

HEX1 Currency Other

Degree Distribution
g6p Reaction
Metabolic Network

Reaction

PGI
glc-D
Clustering coefficient
Average Path Length
Largest Strong Component
f6p
g6p

PFK
f6p

h fdp adp
fdp


atp
PGI: g6p ↔ f6p

HEX1 Currency Other

Degree Distribution
g6p Reaction
Metabolic Network

Reaction

PGI
glc-D
Clustering coefficient
Average Path Length
Largest Strong Component
f6p
g6p

The values obtained for different network measures depend on the list
PFK
f6p
of currency metabolites used to generate the unipartite graph of
metabolites starting from the bipartite graph. In this work, we have
h
usedfdpwell defined biochemically sound list of currency metabolites to
a adp
fdp
construct unipartite graph corresponding to each metabolic network.

Constraint I: Given number of valid biochemical
reactions

 Instead of generating fictitious reactions using
edge exchange mechanism, we can restrict to
>6000 valid reactions contained within KEGG
database.

 We use the database of 5870 mass balanced
reactions derived from KEGG by Rodrigues and
Wagner. For simplicity, I will refer to this database
in the talk as ‘KEGG’

Areejit Samal

Global Reaction Set

Nutrient metabolites Biomass composition

The set of external The E. coli biomass
metabolites in iJR904 composition of iJR904
were allowed for was used to determine
uptake/excretion in the + viability of genotypes.
global reaction set.

We can implement Flux
Balance Analysis (FBA)
within this database.

KEGG Database + E. coli iJR904

Areejit Samal

Genome-scale metabolic models: Flux Balance Analysis (FBA)

List of metabolic reactions with
stoichiometric coefficients
Growth rate for
Flux Balance the given medium
Biomass composition Analysis (FBA) ‘Biomass Yield’

Fluxes of all
reactions
Set of nutrients in the
environment

Advantages
FBA does not require enzyme kinetic information which is not known for most reactions.

Disadvantages
FBA cannot predict internal metabolite concentrations and is restricted to steady states.
Basic models do not account for metabolic regulation.

Reference: Varma and Palsson, Biotechnology (1994); Price et al Nature Reviews Microbiology (2004)

Areejit Samal

Bit string representation of metabolic networks within the global
reaction set

KEGG Database E. coli

R1: 3pg + nad → 3php + h + nadh 1
Present - 1
R2: 6pgc + nadp → co2 + nadph + ru5p-D 0
R3: 6pgc → 2ddg6p + h2o 1 Absent - 0
. .
. .
. .
6000
C900
RN: --------------------------------- .
Contains r reactions
N = 5870 reactions within KEGG (1’s in the bit string)
(or global reaction set)

The E. coli metabolic network or any random network of reactions within KEGG can be
represented as a bit string of length N with exactly r entries equal to 1.
r is the number of reactions in the E. coli metabolic network.

Areejit Samal

Ensemble R: Random networks within KEGG with same number of
reactions as E. coli
Random subsets or networks with r reactions

To compare with E. coli, we generate an
ensemble R of random networks as
KEGG with
N=5870 follows:
reactions
Pick random subsets with exactly r (~900)
reactions within KEGG database of N
(~6000) reactions.
r is the no. of reactions in E. coli.
1 0 1 1
0 0 1 0
0 1 1 1 Huge no. of networks possible within
. . . . KEGG with exactly r reactions
. . . . 6000
~ C900
. . . .
. . . .
Fixing the no. of reactions is equivalent to
Each random network or bit string fixing genome size.
contains exactly r reactions (1’s in
the string) like in E. coli.

Areejit Samal

Metabolite degree distribution

A degree of a metabolite is the number of reactions in which the metabolite participates in the
network.
100

10-1

10-2
P(k)

10-3

10-4

E. coli
10-5

10-6
1 10 100 1000
k

E. coli: = 2.17;
Most organisms:  is 2.1-2.3
Reference:
Jeong et al (2000);
Wagner and Fell (2001)

Areejit Samal


network.
100
100

10-1
10-1

10-2
10-2

P(k)
P(k)

10-3
10-3

10-4
10-4
E. coli
E. coli
10-5 Random
10-5

10-6
10-6
1 10 100 1000
1 10 100 1000
k
k

E. coli: = 2.17; Random networks in
Most organisms:  is 2.1-2.3 ensemble R: = 2.51
Reference:
Jeong et al (2000);

Areejit Samal


network.
100 100
100
10-1
-1 10-1
10
10-2
10-2
10-2 10-3

P(k)
P(k)
P(k)

10-3 10-4
10-3
10-5
10-4
10-4
E. coli
10-6 E. coli
E. coli Random
10-5 Random
10-5 10-7 KEGG

10-6 10-8
10-6
1 10 100 1000 1 10 100 1000
1 10 100 1000
k k
k

E. coli: = 2.17; Random networks in Complete KEGG:
Most organisms:  is 2.1-2.3 ensemble R: = 2.51 = 2.31
Reference:
Jeong et al (2000);

Any (large enough) random subset of reactions in KEGG has Power Law degree
distribution !!

Areejit Samal

Reaction Degree Distribution

The degree of a reaction is the number of metabolites that participate in it.

3000
0 Currency
1 Currency
2500 2 Currency
3 Currency atp adp
4 Currency
5 Currency
2000
6 Currency R
Frequency

7 Currency

1500 A B

1000 In each reaction, we have counted the number of
currency and other metabolites which is shown in
500 the figure.
Most reactions involve 4 metabolites of which 2
0
0 2 4 6 8 10 12
are currency metabolites and 2 are other
Number of substrates in a reaction
metabolites.

The reaction degree distribution is bell shaped with typical reaction in the network
involving 4 metabolites.
The nature of reaction degree distribution is very different from the metabolite
degree distribution which follows a power law.

Areejit Samal

Scale-free versus Scale-rich network:
Origin of Power laws in metabolic networks
Tanaka and Doyle have suggested a classification of metabolites into three
categories based on their biochemical roles
(a) ‘Carriers’ (very high degree) atp adp
(b) ‘Precursors’ (intermediate degree)
R
(c) ‘Others’ (low degree)
A B

It has been argued that gene duplication
and divergence at the level of protein
interaction networks can lead to
observed power laws in metabolic
networks. However, the presence of
ubiquitous (high degree) ‘currency’
metabolites along with low degree
‘other’ metabolites in each reaction can
explain the observed fat tail in the
metabolite distribution.

Reference: Tanaka (2005); Tanaka, Csete, and Doyle (2008)

Areejit Samal

Network Properties: Given number of valid biochemical
reactions
10

0.10
8

0.08
6

<L>
<C>

0.06

4
0.04

2
0.02

0.00 0
R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli

Clustering Coefficient Path Length
0.8
1.0
LSC
LSC + IN + OUT

0.6 0.8

Fraction of nodes
0.6
Pc

0.4

0.4

0.2
0.2

0.0 0.0

Probability that a path exists Largest Strong Component
between two nodes

Areejit Samal

Constraint II: Fix the number of metabolites

Direct sampling of randomized networks with same no. of reactions and metabolites as in
E. coli is not possible.

Markov Chain Monte Carlo (MCMC) sampling of randomized networks

KEGG Database E. coli Step 1 Step 2 Ensemble RM

R1: 3pg + nad → 3php + h + nadh 1 1 1 1 0
0 Swap 1 Swap 0 106 Swaps 1 10 Swaps
3
0
R2: 6pgc + nadp → co2 + nadph + ru5p-D
R3: 6pgc → 2ddg6p + h2o 1 0 1 1 1
. 0 0 1 0 0
Accept
. 1 1 0 0 1
. . . Erase . Saving .
. memory of
RN: --------------------------------- . . . . Frequency .
initial
N = 5870 reactions within KEGG network
reaction universe Reject Reject
store store
Reaction swap keeps the number of reactions in network constant

Accept/Reject Criterion of reaction swap:
Accept if the no. of metabolites in the new network is less than or equal to E. coli.

Areejit Samal

Network Properties: After fixing number of metabolites
10

0.10
8

0.08
6

<L>
<C>

0.06

4
0.04

2
0.02

0.00 0

0.8
1.0
LSC
LSC + IN + OUT

0.6 0.8

Fraction of nodes
0.6
Pc

0.4

0.4

0.2
0.2

0.0 0.0

between two nodes

Areejit Samal

Constraint III: Exclude Blocked Reactions

 Large-scale metabolic networks typically have certain
dead-end or blocked reactions which cannot contribute to
KEGG
growth of the organism.
Unblocked
KEGG  Blocked reactions have zero flux for every investigated
chemical environment under any steady-state condition and
do not contribute to biomass growth flux.

 We next exclude blocked reactions from our biochemical
universe used for sampling randomized networks.

For Blocked reactions, see Burgard et al Genome Research (2004)

Areejit Samal

MCMC Sampling: Exclusion of blocked reactions

Ensemble uRM
KEGG Database E. coli Step 1 Step 2

R1: 3pg + nad → 3php + h + nadh 1 1 1 1 0
3
0
R3: 6pgc → 2ddg6p + h2o 1 0 1 1 1
. 0 0 1 0 0
Accept Erase Saving
. 1 1 0 0 1
. . . memory of . Frequency .
. initial
RN: --------------------------------- . . . network
. .
N = 5870 reactions within KEGG
store store


Restrict the reaction swaps to the set of unblocked reactions within KEGG.

Accept if
the no. of metabolites in the new network is less than or equal to E. coli.

Areejit Samal

Network Properties: After excluding blocked reactions
10

0.10
8

0.08
6

<L>
<C>

0.06

4
0.04

2
0.02

0.00 0

0.8
1.0
LSC
LSC + IN + OUT

0.6 0.8

Fraction of nodes
0.6
Pc

0.4

0.4

0.2
0.2

0.0 0.0

between two nodes

Areejit Samal

Constraint IV: Growth Phenotype

Definition of Phenotype
Genotypes Ability to synthesize Viability
biomass components
0
1
1
. Viable
genotype
.
. Flux Growth
. GA Balance
0 Analysis
No Growth
0 (FBA)
1 Unviable
. genotype
x
.
. Deleted

. GB reaction

Bit string and equivalent network
representation of genotypes

For FBA, see Price et al (2004) for a review

Areejit Samal

MCMC Sampling: Growth phenotype in one environment

Ensemble uRM-V1
KEGG Database E. coli Step 1 Step 2

R1: 3pg + nad → 3php + h + nadh 1 1 1 1 0
3
0
R3: 6pgc → 2ddg6p + h2o 1 0 1 1 1
. 0 0 1 0 0
Accept
. 1 1 0 Erase 0 Saving 1
. . . . memory of . Frequency .
RN: --------------------------------- . . . initial . .
network
N = 5870 reactions within KEGG
store store


Restrict the reaction swaps to the set of unblocked reactions within KEGG.

Accept if
(a) No. of metabolites in the new network is less than or equal to E. coli
(b) New network is able to grow on the specified environment

Areejit Samal

Network Properties: Growth phenotype in one
environment
10

0.10
8

0.08
6

<L>
<C>

0.06

4
0.04

2
0.02

0.00 0

0.8
1.0
LSC
LSC + IN + OUT

0.6 0.8

Fraction of nodes
0.6
Pc

0.4

0.4

0.2
0.2

0.0 0.0

between two nodes

Areejit Samal

Network Properties: Growth phenotype in multiple
environments
10

0.10
8

0.08
6

<L>
<C>

0.06

4
0.04

2
0.02

0.00 0

0.8
1.0
LSC
LSC + IN + OUT

0.6 0.8

Fraction of nodes
0.6
Pc

0.4

0.4

0.2
0.2

0.0 0.0

between two nodes

Areejit Samal

Global structural properties of real metabolic networks:
Consequence of simple biochemical and functional constraints

Increasing level of constraints
Conjectured by:
A. Wagner (2007) and
B. Papp et al. (2008)

Samal and Martin, PLoS ONE (2011) 6(7): e22295

Areejit Samal

High level of genetic diversity in our randomized ensembles

Any two random networks in our most constrained ensemble uRM-v10 differ in
~ 60% of their reactions.

1 1 1 1
0 0 0 0
1 0 1 0
. . . versus .
versus
. . . .
. . . .
. . . .

E. coli Random network in Random network in Random network in
ensemble uRM-V10 ensemble uRM-V10 ensemble uRM-V10

Hamming distance between the two networks is ~ 60% of the maximum
possible between two bit strings.
There is a set of ~ 100 reactions which are present in all of our random
networks.

Areejit Samal

Necessity of MCMC sampling

Ensemble uRm-V10

Ensemble uRm-V5
Ensemble uRm-V1
Ensemble uRm
Ensemble Rm

Ensemble R
Embedded sets like Russian dolls
100

Fraction of genotypes that are viable
10-5

10-10

Including the viability constraint on the first
10-15

Glucose

chemical environment leads to a reduction by at 10-20
Acetate
Succinate
Analytic prefactor
least a factor 1022
2000 2200 2400 2600 2800

Reference: Samal et al, BMC Systems Biology (2010) Number of reactions in genotype

Areejit Samal

Summary

 We have proposed a new method based
on MCMC sampling and FBA to
generate randomized metabolic
networks by successively imposing few
macroscopic biochemical and functional
constraints into our sampling criteria.

 By imposing a few macroscopic
constraints into our sampling criteria, we
were able to approach the properties of
real metabolic networks.

 We conclude that biological function is a
main driver of structure in metabolic
networks.

Areejit Samal

Limitations and Future Outlook

 We can extend the study using the SEED database to many organisms.

Automated reconstruction of
metabolic models for >140 organisms
on which FBA can be performed.

 The list of reactions in KEGG is incomplete and its use may reflect evolutionary
constraints associated with natural organisms.
 From a synthetic biology context, there could be many more reactions that can
occur which are not in KEGG.
 Alternate approach proposed by Basler et al, Bioinformatics, 2011 which
generates new mass-balanced reactions but does not impose functionality
constraint.
Generate new mass-balanced
reactions but no constraint of
biomass production.

Areejit Samal

Genotype networks: A many-to-one genotype to phenotype map

RNA Sequence Protein Sequence Gene network
Genotype
folding

Phenotype

Secondary structure Three dimensional Gene expression
of pairings structure levels

Reference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner (2007)

Genotype network refers to the set of (many) genotypes that have a given phenotype,
and their “organization” in genotype space. In the context of RNA, genotype networks
are commonly referred to as Neutral networks.

Areejit Samal

Properties of the genotype-to-phenotype map

 Are there many genotypes that have a given
phenotype?

 Do the genotypes with a given phenotype form a
significant fraction of all possible genotypes?

 Are all the genotypes with a given phenotype rather
similar?

 Are biological genotypes atypically robust
compared to random genotypes with similar
phenotype?

 Are biological genotypes atypically innovative?

 Can genotypes change gradually in response to
selective pressures?
A. Wagner, Nat. Rev. Genet. (2008)

Areejit Samal

Properties of the genotype-to-phenotype map

 Are there many genotypes that have a given
phenotype?

 Do the genotypes with a given phenotype form a
significant fraction of all possible genotypes?
We have considered these questions in the context of metabolic
 Are all the genotypes with a given phenotype rather
similar? networks.

 Are biological genotypes atypically robust enzymes or reactions
Genotype = lists of metabolic
compared to random genotypes with similar in a given environment
Phenotype = growth or not
phenotype?

 Are biological genotypes atypically innovative?

 Can genotypes change gradually in response to
selective pressures?
A. Wagner, Nat. Rev. Genet. (2008)

Areejit Samal

Computational tool for Evolutionary Systems Biology

Areejit Samal

Acknowledgement
Reference

Olivier C. Martin
Université Paris-Sud
Other Work

Andreas Wagner João Rodrigues Jürgen Jost
University of Zurich Max Planck Institute for Mathematics in the Sciences
Genotype networks in metabolic reaction spaces
Areejit Samal, João F Matias Rodrigues, Jürgen Jost, Olivier C Martin* and Andreas Wagner*
BMC Systems Biology 2010, 4:30

Environmental versatility promotes modularity in genome-scale metabolic networks
Areejit Samal, Andreas Wagner* and Olivier C Martin*
BMC Systems Biology 2011, 5:135
Funding

CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG
GDRE 513) for a Postdoctoral Fellowship.

Areejit Samal

Areejit Samal Randomizing metabolic networks

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Areejit Samal Randomizing metabolic networks

Ähnlich wie Areejit Samal Randomizing metabolic networks (20)

Mehr von Areejit Samal

Mehr von Areejit Samal (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Areejit Samal Randomizing metabolic networks