Hydrological models have been increasingly used during the last decades for flood forecasting and water resources management, among others. Therefore, a proper assessment of the reliability of hydrological simulations is of utmost importance to enhance societal confidence on model predictions. To date, several goodness-of-fit measures have been proposed to assess model performance and to measure the agreement between observations and simulated equivalents. Despite serious and well-known limitations, many single-objective goodness-of-fit measures are still of widespread use. As an example, the Nash-Sutcliffe efficiency (NSE) has been highly criticised as an inappropriate benchmark for comparing modelling results to observations, nonetheless, it is still one of the most common performance measures used by both practitioners and environmental scientists.
This work examines how different goodness-of-fit measures reported in literature behave when used within a single-objective optimisation procedure, both for the identification of the model’s parameters and in the reproduction of high- and low-flow events. By doing so, several parameters of the semi-distributed Soil and Water Assessment Tool (SWAT) 2005 are calibrated, by using a novel global optimisation technique called Particle Swarm Optimization (PSO) and changing only the goodness-of-fit measure to be optimised. In particular, the Kling-Gupta efficiency (KGE), the volumetric efficiency (VE) the NSE, the index of agreement (d), along with modified and relative versions of the last two measures are compared for a calibration of SWAT aiming at reproducing low flows. On the other hand, KGE, NSE, d, and the coefficient of persistence (cp) are compared for a calibration aiming at reproducing high flows. In addition, a relatively new goodness-of-fit measure is introduced, which, when used along with some ad-hoc weighting scheme, allows capturing model errors in the high or low spectrum of the analysed time series with little influence of errors in other portions of the signal.
Optimal parameter values presented a considerable variation depending on the objective function used in PSO. Discharge values obtained during calibration are used to compute Empirical Cumulative Distribution Functions (ECDFs) for three different quantiles representative of simulated low, medium and high flows. Simulated quantiles computed with the new goodness-of-fit function in combination with the ad-hoc weighting scheme were closer to their observed counterparts. Results provide quantitative guidance about the bias of the calibrated hydrological model to reproduce low and high flows when different well-known goodness-of-fit measures are used as objective function during calibration. The latter facilitates the elaboration of standards about which benchmarks to use when trying to represent different extreme events.
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Comparing Goodness-of-fit Measures for Calibration of Models Focused on Extreme Events (EGU 2012)
1. Joint
Research
Comparing Goodness-of-fit Measures for Calibration
Comparing Goodness-of-fit Measures for Calibration of
of
Centre
EGU2012-11549
Session: HS1.3
Models Focused on Extreme Events
Models Focused on Extreme Events
Apr 23th, 2012 Mauricio Zambrano-Bigiarini and Alberto Bellin
Mauricio Zambrano-Bigiarini and Alberto Bellin
1) Motivation 4) Results
Despite serious and well-known limitations ( e.g., Legates and
McCabe, 1999), many single objective goodness-of-fit measures
are still of widespread use. As an example, the Nash-Sutcliffe 3.1) Nash-Sutcliffe efficiency : 3.6) Relative Nash-Sutcliffe efficiency : Nomenclature
efficiency (NSE) has been highly criticised as an inappropriate (Nash and Sutcliffe, 1970)
N
(Krausse et al., 2005) ● Si : i-th simulated value
2
benchmark for comparing modelling results to observations N
Oi−S i
( )
2
∑ ( O i − Si ) ∑ ● Oi : i-th observed value
(e.g., Schaefli and Gupta 2007), nonetheless, it is still one of NSE =1− i =1
i =1 Oi
N
2 rNSE =1− ● j : arbitrary power, i.e, positive integer
the most common performance measures used by both ∑ ( O i− O ) N
Oi− O 2
environmental scientists and practitioners.
i =1 ∑
i =1
( O ) ●
●
Ō : mean observed value
Ôi : median of the observed values in
3.2) Index of Agreement: 3.7) Relative Index of Agreement: the same month than Oi
2) Aim (Willmot, 1981)
N
(Krausse et al., 2005)
N 2
● r : Pearson's product-moment
Oi− S i
( )
2
∑ ( O i − Si ) correlation coefficient
To provide practical guidance about how different goodness- ∑
d =1− i =1 Oi α : ratio between the standard
of-fit measures reported in literature perform when used within i =1 ●
N
2
rd =1− N 2 deviation of simulations (σs) and
∑ (∣Si− O∣+∣O i− O∣) ∣Si − O∣+ ∣Oi −O∣
a single-objective optimisation procedure, both for the
identification of model parameters and in the reproduction of
i =1 ∑
i =1
( O ) observations (σo)
● β : ratio between the mean of the
high- and low-flow events. 3.3) Coefficient of Persistence: 3.8) Modified Nash-Sutcliffe efficiency : simulations (μs) and observations
(Kitanidis & Bras, 1980) (Legates and McCabe, 1999) (μo)
N N
2
3) Methodology ∑ ( S i− O i ) ∑ ∣Oi −Si∣j ● ωi : weight in [0,1] applied to both
i =2
cp =1− N mNSE =1− i =1 observed and simulated values at Fig 08. Boxplots summarizing the 5th and 50th
3.1) Study Area
N
time step i
∑ ( O i− Oi−1 )2 ∑ ∣Oi −O∣j Fig 07. Boxplots summarizing the parameter values
percentiles of daily discharge obtained with calibrations
i =2 i =1 ● λ : number in [0, 1] representing the focused on low flows. Only the best half of the total
weight given to the high-flow part obtained with calibrations focused on low flows. Only
parameter sets obtained during calibration are
3.4) Volumetric Efficiency: 3.9) Modified Index of Agreement : of the signal, with λ close to 1 the best half of the total parameter sets obtained
considered for each calibration exercise. Grey vertical
(Criss and Winston, 2008) (Legates and McCabe, 1999) when focusing on high-flow during calibration are considered for each calibration
Fig 05. Boxplots summarizing the parameter values lines indicate the value of the observed 5th and 50th
N events, and λ close to zero when exercise.
N obtained with calibrations focused on high flows. Only percentiles.
∑ ∣Oi −Si∣ ∑ ( O i− S i ) j focusing in low-flow conditions
the best half of the total parameter sets obtained
i =1
VE =1− i=1 N d j=1− OL ,OH : user-defined thresholds used during calibration are considered for each calibration
8) Conclusions
●
N
j exercise.
∑ Oi ∑ (∣Si −O∣+ ∣Oi −O∣) to separate low and high values,
respectively. In order to avoid
i =1 i =1
subjectivity in the selection of OL ●
Large underestimation of observed low flows (~40%) were
3.5) Kling-Gupta efficiency: 3.10) Weighted Seasonal Nash-Sutcliffe: and OH , we use the flow duration obtained with simulated values calibrated by using NSE and the
(this work) N
(Gupta et al., 2009) curve criterion proposed by Yilmaz
∑ ∣ω i ( Oi −Si )∣j et al., 2008 KGE. Such underestimation is commonly masked out by the
KGE =1− √( r −1)2 + (α −1)2+ (β−1)2 wsNSE =1− i=1
N good overall fit in terms of NSE, KGE and other statistics.
∑ ∣ω i (O i− Oi )∣j
̂
Cov s, o σs μs i =1
r = σ σ α = σ o β= μ o
Low-flows calibrations:
{ }
s o λ , O i⩾OH
3.6) Seasonal Nash-Sutcliffe: ω i = (1−λ )+
(2 λ −1)( Oi− OL )
, OL < Oi < O H
●
rNSE and wsNSE (j=1, λ=0) perform the best (in terms of 5th
O H −OL
(Adapted from Garrick et al., 1978)
1−λ , O i⩾OL
percentile), and both of them provide also a good
N
∑ ∣Oi −Si∣j representation of medium flows (50th percentile) with all the
sNSE =1− iN
=1
other measures overestimating them.
∑ ∣O i−O i∣j
̂ ●
NSE, KGE and d tend to underestimate GW_DELAY in
i =1
comparison to to the other goodness-of-fit measures.
Fig 01. Location of the Ega River Basin, meteorological stations, and discharge station used for
the calibration of the upper catchment.
High-flows calibrations:
3.2) Calibration Procedure ●
wsNSE (j=2, λ=0.95), d and KGE perform the best (in terms of
The Soil and Water Assessment Tool (SWAT) version 2005 95th percentile). At the same time, only wsNSE provides a good
was calibrated for the period Jan/1961-Dec/1970, using the representation of medium flows (50th percentile), with all the
first year as warming up period. other goodness-of-fit measures overestimating them.
A set of 9 parameters was selected for calibration: ●
The optimal value and variability of parameters related to the
Parameter Min Max Fig 02. Discharge time series corresponding to the outlet of the upper part of the Ega River Basin (Ega en Estella stream gauge, Q071). slow response of the catchment (GW_DELAY and ALPHA_BF)
Base flow alpha factor [days] ALPHA_BF 1.00E-1 9.90E-1 Horizontal blue and red lines show the discharge values used to separate high and low flows, respectively (see Fig 03).
Manning's “n” value for the main channel [-] CH_N2 1.60E-2 1.50E-1
was very similar among all the goodness-of-fit measures tested
Fig 06. Boxplots summarizing the 50th and 95th
Initial SCS CN II value [-] CN2 4.00E+1 9.50E+1 percentiles of daily discharge obtained with (both close to zero), with KGE and wsNSE presenting the
Saturated hydraulic conductivity [mm/hr] SOL_K 1.00E-3 1.00E+3 calibrations focused on high flows. Only the best half largest spread. However, those values are very different from
of the total parameter sets obtained during calibration
Available water capacity, [mmH2O/mm soil] SOL_AWC 1.00E-2 3.50E-1
are considered for each calibration exercise. Grey the ones obtained during the calibration focused on low flows.
Effective hydraulic conductivity in main channel [mm/hr] CH_K2 0.00E+0 2.00E+2 vertical lines indicate the value of the observed 5th
Soil evaporation compensation factor [-] ESCO 1.00E-2 1.00E+0 and 50th percentiles.
Surface runoff lag time [days] SURLAG 1.00E+0 1.20E+1 References:
Snowfall temperature [°C] SFTMP -5.00E+0 5.00E+0 ●
Criss, R., Winston, W., 2008. Do Nash values have value? Discussion and alternate proposals. Hydrological Processes 22, 2723–2725
Calibration was carried out using Particle Swarm Optimisation ●
Garrick, M., Cunnane, C., Nash, J.E., 1978. A criterion of efficiency for rainfall-runoff models. Journal of Hydrology 36, 375–381
Fig 03. Daily flow duration curve corresponding to Kennedy, J., and R. Eberhart (1995), Particle swarm optimization, in Proceedings IEEE International Conference on Neural Networks, 1995, vol. 4, pp. 1942–1948,
(PSO, Kennedy and Eberhart, 1995), with 20 particles, 300
●
the outlet of the upper part of the Ega River Basin Fig 04. Weighting values used in wsNSE, which take into account the skewness of the
●
Kitanidis, P.K., Bras, R.L., 1980. Real-time forecasting with a conceptual hydrologic model 2. applications and results. Water Resources Research 16, 1034–1044.
iterations, ω=1/(2log2), c1=c2=0.5+log2, linearly decreasing (Q071). Verticl black lines show the discharge observed daily discharges. Red and blue lines correspond to the weights used when ●
Krause, P., Boyle, D., Bäse, F., 2005. Comparison of different efficiency criteria for hydrological model assessment. Advances in Geosciences 5, 89–97..
Vmax from 1.0 to 0.5, and random topology (see EGU2012-10950 values used to separate high and low flows focusing on low- and high-flow events, respectively. On the left panel, the horizontal axis ●
Legates, D., McCabe Jr., G., 1999. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resources Research 35,
following Yilmaz et. al., 2008) represent discharges values of the Ega River Basin (Q071), while on the right panel the 233–241
for further details).
Schaefli, B., Gupta, H., 2007. Do Nash values have value?. Hydrological Processes 21, 2075–2080.
www.jrc.europa.eu
Mauricio Zambrano-Bigiarini horizontal axis represent the empirical CDF of the observed discharge values. ●
●
Willmott, C., 1981. On the validation of models. Physical Geography 2, 184–194.
European Commission • Joint Research Centre • Institute for Environment and Sustainability ●
Yilmaz, K., Gupta, H., Wagener, T., 2008. A process-based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water
Tel. +39 0332 789588 • Email: mauricio.zambrano@jrc.ec.europa.eu Resources Research 44, W09417.