APPLICATION OF GENE EXPRESSION PROGRAMMING IN FLOOD FREQUENCY ANALYSIS

J. Indian Water Resour. Soc., Vol 35, No. 2, April 2015
1
INTRODUCTION
Flood is usually high level flow of water in a river which is
overflowing over its banks while submerging the nearby area.
Flood problem in India is mostly confined to the states located
in the Indo-Gangetic plains, north-east India and occasionally
in rivers of central India. As the bulk of summer monsoon
rainfall occurs within the period of four months (June to
September), thus majority of floods occur only during these
months (Dhar et al. 2003). Flood forecasting and its estimation
play key role in design of hydraulics structures such as bridges,
spillways for dams, culvert waterways, roads, railway, flood
plain zoning, urban drainage systems and economic evaluation
of flood protection works. Since flood is a very complex
natural event depending upon characteristics of catchment,
rainfall conditions and various other factors, thus its analytical
modelling is very difficult to pursue. However, various
statistical methods are available for the prediction of peak
flood. One such method which is widely used in India and UK
is Gumbel’s extreme value distribution method whereas Log-
Pearson type III distribution method is used in USA. Nadarajah
et al. (2005) analysed 39 years of flood data of Pachang River
in Taiwan and concluded that the Gumbel’s method provides a
reasonable model for both flood volume and flood peak.
Rostami et al. (2007) used L-moment approach for the regional
flood frequency analysis of Halil river basin. Jery Stedinger et
al. (2008) was of the opinion that Expected Moments
Algorithm (EMA) be adopted by the US flood management
community for flood frequency analysis because it provides a
direct fit of the LP3 distribution using the entire data set.
Ladislav Gaal et al. (2010) reviewed methods to incorporate
historical floods into the at-site flood frequency analysis based
on Bayesian inference where a likelihood function is built to
properly handle the information on historical floods. Mujere
(2011) analysed the 30 year peak flood data of Nyanyadzi
River in Zimbabwe and concluded that Gumbel’s distribution
predicts river flood magnitudes very efficiently. Abbulu et al.
(2013) carried out flood frequency analysis for reservoirs in
Vishakhapatnam in India using probability weighted moment
methods. They concluded that the L-moment method gives
better plotting position but have some limitations, so
Gringorten formula is the best plotting position method with
the Gumbel’s distribution.
Recent advancement in soft computing techniques and its
application in hydraulics engineering have challenged the
conventional methods of the analysis. Various hydraulics
engineering problems in general and flood frequency analysis
in particular, are now being solved using several Artificial
Intelligence (AI) techniques, viz. Artificial Neural Networks
(ANN), Genetic Algorithm (GA), Genetic Programming (GP),
Gene Expression Programming (GEP), Group Method of Data
Handling (GMDH) etc. The soft computing tool of Genetic
Programming which is essentially classified as an Evolutionary
Computation (EC) technique has found its foot in the field of
Hydraulic Engineering in general and modelling of water flows
in particular since last 12 years (Shreenivas, 2012). Khalid et
al. (2008) build a hydraulic jump model using Multiple Linear
Regression (MLR), compared it with Gene Expression
programming (GEP) and found that GEP model gave higher
correlation coefficient than MLR, but was more complicated
than the MLR model. They have also concluded that GEP is a
promising AI approach for hydraulic data modelling.
Azamatullah et al. (2011) used GEP for estimating stage
discharge relationship for Pahang River in Malaysia and
compared his results with the conventional methods. They
observed that the performance of the GEP model was found to
be substantially superior to both GP and the conventional
models. Neslihan Seckin et al. (2012) applied GEP and linear
genetic programming (LGP) in addition to logistic regression
(LR) to forecast peak flood discharges and found that
prediction made by GEP was more precise than the LGP and
LR methods. Mujahid et al. (2012) concluded that the
performance of GEP was found to be satisfactory and
encouraging when compared with regression and ANN models
in predicting bridge pier scour depth. They also mentioned that
GEP has the unique capability of providing a compact and
explicit mathematical expression for computing bridge scour.
Zahiri et al. (2012) used GEP for the prediction of flow
discharge in compound channels and compared the results with
vertical divided channel method (VDCM), and concluded that
GEP model predict discharge more accurately than VDCM, as
this conventional approach over estimates the discharge ratios
Journal of Indian Water Resources Society,
Vol 35, No.2, April, 2015
APPLICATION OF GENE EXPRESSION PROGRAMMING IN FLOOD
FREQUENCY ANALYSIS
Mohd. Muzzammil1
, Javed Alam*2
and Mohd Danish3
ABSTRACT
Flood frequency and its magnitude are essential for the proper design of hydraulics structures such as bridges, spillways, culverts, waterways, roads, railways,
flood control structures and urban drainage systems. Since, flood is a very complex natural event depending upon characteristics of catchment, rainfall
conditions and various other factors, thus its analytical modelling is very difficult to pursue. Recently, artificial intelligence techniques such as gene expression
programming (GEP), artificial neural network (ANN) etc. have been found to be efficient in modelling complex problems in hydraulic engineering. The
performance of GEP model has been reported to be better than that of the ANN. Moreover, GEP provides mathematical equation which makes it more superior
over other soft computing techniques that do not give any analytical mathematical equation. Therefore, in present study, GEP is implemented in flood
frequency analysis for typical Indian river gauging station. The results obtained in the present study are highly promising and suggest that GEP modelling is a
versatile technique and represents an improved alternative to the more conventional approach for the flood frequency analysis.
Keywords: Flood frequency analysis, GEP, ANN, Gumbel’s distribution
1. Professor, Department of Civil Engineering, ZH College of
Engineering & Technology, AMU, Aligarh-202002.
2. Associate Professor, 3. M.Tech. Student
Email:muzammil786@rediffmail.com;
javed_alig2000@yahoo.co.in; mohd.danish999@gmail.com.
* Corresponding Author, (LM-95-4697)
Manuscript No.: 1383

2
with large errors. Azamatullah et al. (2012) used GEP and
ANN-RBF models to predict the values of relative scour depth
from laboratory culvert-scour measurements and observed that
the overall performance of GEP was superior than ANN. Aziz
et al. (2013) used GEP and ANN techniques for the regional
estimation of flood in Australia. They have found that
performance of ANN model was better than the GEP model.
Past studies revealed that Gumbel’s method is the most popular
and reliable method for the flood frequency analysis, whereas
the performance of the GEP among other soft computing
techniques is considered to be the best. Hence, in present
study, flood frequency analysis of Ganga River at Hardwar has
been carried out using Gumbel’s distribution and GEP. The
performance of these models has been assessed to identify
reliable flood prediction method in Indian environment. The
ANN was also considered only for the comparison purpose.
METHODS OF FLOOD FREQUENCY
ANALYSIS
Gumbel’s method is widely used in India for flood prediction
problems for so many years but ANN and GEP has been used
in hydraulics engineering from a decade or two. All these
methods are briefly described in the following sections.
Gumbel’s Method
The extreme value distribution was introduced by Gumbel in
1947 and commonly known as Gumbel’s distribution
(Subramanya, 2010). It is one of the most widely used
probability distribution functions for extreme values in
hydrologic and meteorological studies for prediction of flood
peaks, maximum rainfalls, maximum wind speed etc. Gumbel
defined a flood as the largest flow among the daily flows for a
year and the annual series of flood flows constitute a series of
largest values of flows. According to his theory of extreme
events, the probability of occurrence of an event is equal to or
larger than a value of x0 is given by:
(1)
Where y = α (x – a); a = – 0.45005 σx ; α =1.2825/ σx ; y = a
dimensionless variable: = mean and σx = standard deviation
of the variate X.
Reduced variate for a given return period T is given as:
(2)
The value of variate X with the return period T (XT ) is given
by:
(3)
= Frequency factor (4)
where, and are the reduced mean and reduced standard
deviation in Gumbel’s extreme value distribution.
Artificial Neural Network (ANN)
Artificial neural networks provide a random mapping between
an input and an output vector, typically consisting of three
layers of neurons, namely, input, hidden and output, with each
neuron acting as an independent computational element. The
strength of neural networks is derived from the high degree of
freedom associated with their architecture. Prior to application,
the network is trained using observed data sets. This feeds the
network with input and output pairs and determines the values
of connection weights, bias or centres. The training may
require the completion of many epochs until the training sum
of squares error reaches a specified error goal. The concepts
involved behind these training schemes were outlined in the
American Society of Civil Engineers (ASCE) Task Committee
(2000 a,b) (Azmathullah, 2011).
Gene Expression Programming (GEP)
Gene expression programming (GEP) was invented by Candida
Ferreira in 1999 (Ferreira, 2001). GEP like its predecessors,
genetic algorithms (GA) and genetic programming (GP), also
uses populations of individuals, selects them according to
fitness and introduces genetic variation using one or more
genetic operators (Ferreira, 2001). In GA the individuals are
linear strings of fixed length (chromosomes) while in GP the
individuals are nonlinear entities of different sizes and shapes
(parse trees). On the other hand in GEP, the individuals are
encoded as linear strings of fixed length (the genome or
chromosomes) which are afterwards expressed as nonlinear
entities of different sizes and shapes (i.e. simple diagram
representations or expression trees). The great insight of GEP
consisted in the invention of chromosomes capable of
representing any expression tree. For that Ferreira (2001)
created a new language (which she named as Karva language)
to read and express the information of GEP chromosomes.
Furthermore, the structure of chromosomes was designed to
allow the creation of multiple genes, each encoding a sub-
expression tree. The genes are structurally organized in a head
and a tail, and it is this structural and functional organization of
GEP genes that always guarantees the production of valid
programs, no matter how much or how profoundly the
chromosomes are modified.
A gene consists of a fixed number of symbols encoded in the
Karva language. A gene has two sections, the head and the tail.
The head is used to encode functions for the expression. The
tail is a reservoir of extra terminal symbols that can be used if
there aren’t enough terminals in the head to provide arguments
for the functions. Thus, the head can contain functions,
variables and constants, but the tail can contain only variables
and constants (i.e. terminals). The number of symbols in the
tail is determined by the equation t = [h (MaxArg – 1) + 1],
where ‘t’ is the number of symbols in the tail, ‘h’ is the number
of symbols in the head and ‘MaxArg’ is the maximum number
of arguments required by any function that is allowed to be
used in the expression. The key to GEP’s ability to quickly
mutate valid expressions is the way it encodes symbols in
genes. Consider a simple mathematical expression,
, now this expression can be represented by expression
tree as shown in Fig.1. This expression tree (ET) can be
encoded into Karva language and the corresponding expression
is known as K-expression.

3
Fig. 1: Expression Tree for the above mathematical
expression
To convert an expression tree to the Karva notation, start at the
left-most symbol in the top line of the tree and scan symbols
left-to-right and top-to-bottom. Each time a symbol is
encountered, add it to the K-expression in left-to-right order.
When there are no more symbols on a line, advance to the left
end of the following line. Thus, the K- expression for the above
ET can be represented as + √ a b + c d.
In GEP, just like in other evolutionary methods, the process
starts with the random generation of an initial population
consisting of chromosomes of fixed length. The chromosomes
may contain one or more than one genes. Each chromosome is
then expressed and its fitness is evaluated using one of the
fitness function equations available in the literature. These
chromosomes are then selected based on their fitness values
using a roulette wheel selection process. More fit chromosomes
have greater chances of selection for passage to the next
generation. After selection, these are reproduced with some
modifications performed by the genetic operators. In Gene
Expression Programming, genetic operators such as mutation,
inversion, transposition and recombination are used for these
modifications. Mutation is the most efficient genetic operator,
and it is sometime used as the only means of modification. The
new individuals are then subjected to the same process of
modification, and the process continues until the maximum
number of generations reached or the required accuracy is
achieved.
Mutation: In GEP there are several types of mutation, some are
simple random changes in the symbols of genes, others are
more complex involving reversing the order of symbols or
transposing symbols or genes within the chromosome. Simple
mutation just replaces symbols in genes with replacement
symbols. Symbols in the heads of genes can be replaced by
functions or terminals (variables and constants). Symbols in the
tail sections can be replaced only by terminals.
Inversion: It reverses the order of symbols in a section of a
gene.
Transposition: Selects a group of symbols and moves the
symbols to a different position within the same gene and moves
entire gene around in the chromosome. Recombination: During
this, two chromosomes are randomly selected, and genetic
material is exchanged between them to produce two new
chromosomes. It is analogous to the process that occurs when
two individuals are bred, and the offspring share a mixture of
genetic material from both parents (Azmathullah, 2011).
APPLICATION OF VARIOUS FLOOD FREQUENCY
TECHNIQUES
The annual peak flood data for the period of 1901-1977 (77
years) used in the present study is from the river Ganga at
Hardwar. The site of Hardwar is around 110 kilometres
downstream of Tehri dam. Bhimgoda barrage constructed at
Hardwar diverts large quantity of its waters into the Upper
Ganga Canal to provide water for irrigation, which irrigates the
doab region of Utter Predesh.The peak flood data at Hardwar is
available in Varshney (1979). The mean and standard deviation
of the flood data used are 6027 and 2640 cumec respectively.
Derivation of a Relation of Peak Flood with Return Period
based on GEP
The annual peak discharge (Q) was modelled in terms of the
recurrence interval (T) using a GEP approach. The GEP
modelling is generally carried out in five major steps:
(i) The first step is to select the proper fitness function. The
fitness , fi , of an individual program, is measured by
(5)
where M is the range of selection, C(i,J) is the value returned by
the individual chromosome i for the fitness case j (out of Ct
fitness cases) and Tj is the target value for the fitness case J.
The advantage of this kind of fitness function is that the system
can find the optimum solution by itself.
(ii) The second step is to choose the set of terminals and
functions to create the chromosomes. In this problem, the
terminal set consists of a single independent variable, i.e. {T}.
The choice of the appropriate function set is useful in obtaining
a simplified mathematical expression. Hence, in this study,
basic mathematical operators (+, −, ×, ÷, √) were used.
(iii) The third major step is to choose the chromosomal
architecture, i.e., the length of the head and the number of
genes. Initially a single gene was used with two head lengths
and during each run the number of genes and heads were
increased by one at a time until the most appropriate fit was
obtained. It was observed that more than six genes and a head
length greater than ten, did not significantly improve the
performance of GEP model. Thus, the head length, h = 10, and
six genes per chromosome were employed for the present GEP
model.
(iv) The fourth major step is to choose the linking function. In
this study, addition was used as a linking function.
(v) The fifth and final step is to choose the set of genetic
operators and their rates. A combination of all genetic operators

4
(mutation, transposition and crossover) was used for this
purpose (Table 1).
Table 1: GEP optimal model parameters
S. N.. Parameters Setting
1. Population Size 54
2. Genes per chromosome 6
3. Gene head length 10
4. Functions + - × ÷ √
5. Gene tail length 11
6. Mutation rate 0.044
7. Inversion rate 0.1
8. Gene transposition rate 0.1
9. One point recombination rate 0.3
10. Two point recombination rate 0.3
11. Gene recombination rate 0.1
The equation obtained from GEP is given as:
(6)
The simplified version of Eq. 6 may be obtained as:
(7)
Where, Q = flood discharge in m3
/s and T = recurrence interval
in years.
The corresponding expression tree (ET) for the above equation
is given in Fig. 2
Notation :
Fig. 2: Expression tree corresponding to the simplified
equation of GEP
Derivation of Peak Flood relation based on Gumbel's
Method
The equation for the flood discharge with the return period in
terms of frequency factor based on Gumbel’s methods for the
data set as used in GEP model has been obtained as:
(8)
Derivation of ANN Prediction model
A commonly used Feed Forward Back Propagation (FFBP)
algorithm of ANN has been developed in the MATLAB
environment for peak flood estimation in the present study. The
Levenberg-Marquardt algorithm was used for the faster
training. The method involves the training of ANN with return
period (T) as input and the peak flood (Q) as output. Thus the
number of neurons in input and output layers would be one
each. The optimal number of neurons in the hidden layers for
the same data as used in the earlier flood prediction models was
found to be 2.
Performance assessment of Prediction models
The performance of various flood prediction models were
analyzed by computing correlation coefficient (R) and root
mean square error (RMSE) values for the predicted and
observed peak floods. A low RMSE value and higher
correlation indicates good performance of the applied methods.
The quantitative performance of Gumbel's distribution, ANN
and GEP has been shown in Table 2. It may be observed that
the GEP model gives the highest value of R (0.997) and the
lowest value of RMSE (0.046). It indicates that the
performance of GEP is the best among other prediction models.
This table also indicates that performance of ANN model is
better than Gumbel's method and poorer than the GEP model.
A qualitative assessment of various flood prediction models
may be observed in Fig. 3. It may be observed that there is
slight difference in flood prediction for lower return period (say
less than 45 years in the present site) but a large difference in
prediction may be observed particularly between soft
computing techniques and the conventional method. This
difference in prediction appears to increase further with
increase in return period. However, there is a minor difference
in prediction between ANN and GEP models.
Fig. 4 shows the variation of observed with predicted flood
discharge. A perusal of Fig. 4 indicates that the performance of
GEP and ANN models is better than the Gumbel,s method.
(a)

5
(b)
Fig. 3: A comparison of flood prediction models (a) on
ordinary scale (b) on semi-log scale.
Fig. 4: A relation of observed and predicted Discharge
Table 2: Performance of prediction models
Modelling Techniques R RMSE
Gumbel’s distribution 0.969 0.085
ANN 0.996 0.057
GEP 0.997 0.046
CONCLUSION
An attempt was made to assess the performance of the gene
expression programming model, ANN model and the
Gumbel's method for flood frequency analysis for a typical
Indian river gauging site. The performance of GEP was found
to be the best among the various prediction models under
consideration. The GEP model also provides a mathematical
equation which makes it more superior over those soft
computing techniques (like ANN) that does not give any
analytical mathematical equation. The results of this study are
highly promising and suggest that GEP modelling is a versatile
technique and represents an improved alternative to the more
conventional approaches for the flood frequency analysis.
REFERENCES
1. Abbulu Y., Laxman P. and Bhadrudu V. K., 2013. “Flood
Analysis of Reservoirs in Visakhapatnam District By Using
Probability Methods”, International Journal of Civil
Engineering (IJCE), Nov 2013, vol. 2 (5), 17-24.
2. ASCE TASK Committee, 2000a. “Artificial Neural
Networks in Hydrology I: Preliminary Concepts”,
Journal of Hydrologic Engineering, 5(2) 116-124.
3. ASCE TASK Committee, 2000b. “Artificial Neural
Networks in Hydrology II: Hydrologic Applications”,
Journal of Hydrologic Engineering. 5(2) 124-137.
4. Azamathulla H. M. and Haque A. A., 2012. “Prediction of
Scour Depth at Culvert Outlets using Gene-Expression
Programming”, International Journal of Innovative
Computing Information and Control, July 2012, vol. 8
(7B), 5045-5054.
5. Azamathulla H. M., Ghani A., Leow C. S., Chang C. K.
and Zakaria N. A., 2011. “Gene-Expression Programming
for the Development of a Stage-Discharge Curve of the
Pahang River”, Water Resource Management vol. 25,
2901–2916.
6. Aziz, Rahman A., Shamseldin A. and Shoaib M. 2013.
Regional flood estimation in Australia: Application of gene
expression programming and artificial neural network
techniques, 20th International Congress on Modelling and
Simulation, Adelaide, Australia, December 2013.
7. Dhar and Shobha N., 2003. “Hydro meteorological
Aspects of Floods in India”, Natural Hazards 2003, (28),
1–33.
8. Ferreira C., 2001. Gene Expression Programming in
Problem Solving, invited tutorial of the 6th Online World
Conference on Soft Computing in Industrial Applications,
September 10-24 2001.
9. Ferreira C., 2001. “Gene Expression Programming: A
New Adaptive Algorithm for Solving Problems”, Complex
Systems, Vol. 13 (2), 87-129.
10. Gaal L., Szolgay J, Kohnova S, Lavcova K. and Viglione
A., 2010. “Inclusion of Historical Information in Flood
Frequency Analysis Using a Bayesian MCMC Technique:
a case study for the Power dam Orlik, Czech Republic”,
2010, vol. 40 (2), 121–147.
11. Khalid E. and Negm A, 2008. “Performance Evaluation of
Gene Expression Programming for Hydraulic Data
Mining”, International Journal of Information
Technology, April 2008, vol. 5 (2), 126-131.
12. Mujahid, Azamathulla, Tufail and Ghani, 2012. Bridge
pier scour prediction by gene expression programming,
Proceedings of the Institution of Civil Engineers, Water
Management, October 2012, vol. 165 WM9, 481–493
http://dx.doi.org/10.1680/wama.11.00008 .
13. Mujere N., 2011. “Flood Frequency Analysis Using the
Gumbel Distribution”, International Journal on Computer
Science and Engineering (IJCSE), July 2011, vol. 3 (7),
2774-2778.
14. Nadarajah and Shiau J. T.,2005. “Analysis of Extreme
Flood Events for the Pachang River, Taiwan”, Water
Resources Management 2005, (19), 363–374.
15. Rostami R. and Rahnama M. B., 2007. “Halil-River Basin
Regional Flood Frequency Analysis Based on L-moment

6
Approach”, International Journal of Agricultural
Research 2 (3), 261-267.
16. Seckin N. and Guven A., 2012. “Estimation of Peak Flood
Discharges at Ungauged Sites across Turkey”, Water
Resource Management, April 2012, vol. 26, 2569–2581.
17. Sherrod P. H., Technical Reference Manual of DTREG-
Predictive Modelling Software, http://www.dtreg.com,
303-317.
18. Shreenivas N. L. and Pradnya R. D., 2012. “Genetic
Programming: A Novel Computing Approach in Modelling
Water Flows”, http://dx.doi.org/10.5772/48179.
19. Stedinger J. R. and Griffis V. W., 2008. “Flood Frequency
Analysis in the United States: Time to Update”, Journal of
Hydrologic Engineering April 2008, 199-204.
20. Subramanya K., 2010. “Engineering Hydrology”, Tata
McGraw Hill, Third Edition 2010, 255-257.
21. Varshney, R.S., 1979. “Engineering Hydrology”, Nem
Chand &Bros Roorkee 1979, 603-605.
22. Zahiri A. R. and Eghbali P., 2012.“Gene Expression
Programming for Prediction of Flow Discharge in
Compound Channels”, Journal of Civil Engineering and
Urbanism, vol. 2 (4), 164-169.

APPLICATION OF GENE EXPRESSION PROGRAMMING IN FLOOD FREQUENCY ANALYSIS

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie APPLICATION OF GENE EXPRESSION PROGRAMMING IN FLOOD FREQUENCY ANALYSIS

Ähnlich wie APPLICATION OF GENE EXPRESSION PROGRAMMING IN FLOOD FREQUENCY ANALYSIS (20)

Mehr von Mohd Danish

Mehr von Mohd Danish (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

APPLICATION OF GENE EXPRESSION PROGRAMMING IN FLOOD FREQUENCY ANALYSIS