## Abstract

Hydrological models are developed for different purposes including flood forecasting, design flood estimation, water resources assessment, and impact study of climate change and land use change, etc. In this study, applicability and uncertainty of two deterministic lumped models, the Xinanjiang (XAJ) model and the Hydrologiska Byråns Vattenbalansavdelning (HBV) model, in design flood estimation are evaluated in a data rich catchment in southern China. Uncertainties of the estimated design flood caused by model equifinality and calibration data period are then assessed using the generalized likelihood uncertainty estimation (GLUE) framework. The results show that: (1) the XAJ model is likely to overestimate the design flood while HBV model underestimates the design flood; (2) the model parameter equifinality has significant impact on the design flood estimation results; (3) with the same length of calibration period, the results of design flood estimation are significantly influenced by which period of the data is used for model calibration; and (4) 15–20 years of calibration data are suggested to be necessary and sufficient for calibrating the two models in the study area.

- continuous simulation
- flood frequency
- GLUE
- HBV model
- L-moments
- Xinanjiang model

## INTRODUCTION

Precise determination of design flood in hydrologic design and risk assessment of hydraulic engineering is of significant importance. In designing hydraulic schemes, the estimation of the design flood value is usually accomplished through the calculation of flood frequency by utilizing long-term historical hydrological data, whose length significantly influences the estimation performance. However, due to the lack of observed hydrological data of sufficient length and quality in many basins, it is hard to achieve the required level of consistency, homogeneity and stationarity of the estimated results. Under these circumstances, rainfall-runoff models can be used to simulate the required runoff series for frequency analysis. It is well known that rainfall-runoff models are widely used in flood forecasting (e.g. Refsgaard *et al.* 1988; Mwale *et al.* 2014), water resources assessment (e.g. Xu *et al.* 1996; Kizza *et al.* 2013; Hailegeorgis & Alfredsen 2016), regional and global water balance calculation (e.g. Arnell 1999; Li *et al.* 2013a, 2013b), impacts of climate change and land use change assessment (e.g. Bastola *et al.* 2011; Lawrence & Haddeland 2011; Gosling 2014; McIntyre *et al.* 2014; Emam *et al.* 2015; Singh *et al.* 2015; Yan *et al.* 2016) and streamflow simulation in ungauged catchments (e.g. Xu, 1999; McIntyre *et al.* 2005; Murray & Bloschl 2011; Mwale *et al.* 2014). Among these applications most emphasis has been put on flood control and water resources management (Pechlivanidis *et al.* 2011), while only a few applications to estimate design flood have been reported and discussed (Boughton & Droop 2003; Ngongondo *et al.* 2013). As precipitation, potential evapotranspiration and temperature are more widely available with longer records than runoff observations (Blazkova & Beven 2002), they are used as the main inputs to rainfall-runoff models for runoff simulations to estimate the design floods.

The use of rainfall-runoff models for design flood estimation can be classified into two categories (Boughton & Droop 2003). One is the event-based simulation, in which rainfall-runoff models are fed by the design rainfall event and the assumed antecedent conditions of the catchment. Another one is the continuous simulation, in which rainfall-runoff models are fed by the historical or simulated rainfall to make a continuous runoff simulation, from which the design flood estimation can be drawn. The event-based simulation approach often assumes the T-year design rainfall event will generate the T-year flood event (Bradley & Potter 1992; Smithers *et al.* 1997). However, this pragmatic assumption clearly does not give good representation of the complex relationship between the design rainfall and design flood (Brigode *et al.* 2014). In the event-based simulation approach the input design rainfall can be the one derived from historical records or from stochastic rainfall event simulations. Saghafian *et al.* (2014) keeps the idea that the simulated runoff values by rainfall-runoff models from precisely observed rainfall may suffer less uncertainty than the runoff transformed from the measured water level through water level-discharge relation curves. He applied this event-based method to the Tangrah watershed located in north-eastern Iran to analyse the flood frequency, and the results showed that the design flood derived from simulated flood peaks were less than that from observations (Saghafian *et al.* 2014). With the fast development of computing facilities, continuous simulation has been used for the design flood estimation (Beven 1987; Calver & Lamb 1995; Lamb 1999; Suman & Bhattacharya 2015). Calver & Lamb (1995) performed the continuous simulation to 10 UK catchments by using the historical rainfall records as the input of two models, the probability-distributed model and the time-area topographic extension model, for flood frequency analysis. Lamb (1999) discussed the calibration method of rainfall-runoff models used for design flood estimation by continuous simulation. The continuous simulation approach can also be used for data limited or ungagged sites if good correlations can be made between the model parameters and the characteristics of the river basin (Saliha *et al.* 2011). Discussions and applications of the continuous simulation in ungagged sites are presented in the works by Blazkova & Beven (2002) and Smithers *et al.* (2013).

The merits and drawbacks of these two design flood estimation approaches have been analysed by many researchers (Lamb 1999; Boughton & Droop 2003). The continuous runoff simulation approach is regarded to be more promising where the problem of antecedent condition is avoided (Calver & Lamb 1995), while in the event-based approach assumptions about the antecedent condition must be made, which will cause uncertainty.

The existence of the four important sources of uncertainties in hydrological modelling, i.e. uncertainties in input data, uncertainties in output data used for calibration, uncertainties in model parameters and uncertainties in model structure (Refsgaard & Storm 1996), means the modelling results are uncertain and the equifinality problem has been universally found in hydrological models. Uncertainty of rainfall-runoff models for flood forecasting has been discussed in many studies (Cameron *et al.* 2000; Beven & Freer 2001; Blazkova & Beven 2004; Li *et al.* 2009, 2010; Wang *et al.* 2015; Tian *et al.* 2016), however uncertainty that is inherent in the continuous simulation for design flood estimation has not drawn much attention in the hydrological literature. Brigode *et al.* (2014) used the bootstrap method to analyse the uncertainties in a semi-continuous design flood estimation method, the SCHADEX extreme flood estimation method, which was developed by Paquet *et al.* (2013). The results showed that the variability of observed rainfall and the difference of the rainfall-runoff model calibration periods had significant impact on the design flood estimation.

In recent years, many researchers have focused on the coupling of stochastic rainfall models and rainfall-runoff models to give a long runoff simulation in order to reduce the uncertainty caused by the extrapolation of the fitted distribution curve of observed peak discharges (Boughton & Hill 1997; Cameron *et al.* 1999; Blazkova & Beven 2002). Cameron *et al.* (1999) used TOPMODEL coupled with a stochastic rainfall generator to estimate 1,000-year flood in the Wye catchment in UK and used the generalized likelihood uncertainty estimation (GLUE) framework to analyse the uncertainty of the continuous simulation approach. The above studies indicate that compared with the large amount of studies conducted on using hydrological models for flood forecasting, water resources assessment, and impact studies of climate and land use changes, more effort needs to be directed to the usefulness and uncertainty of using hydrological models for design flood study.

The objectives of this paper are: (i) to investigate the applicability of lumped conceptual deterministic hydrological models in design flood estimation; and (ii) to analyse the uncertainties that are inherent in the hydrological model-design flood model approach. To achieve these objectives, the following tasks are performed: (1) the historical rainfall data are used to run the rainfall-runoff models (Xinanjiang (XAJ) model and Hydrologiska Byråns Vattenbalansavdelning (HBV) model) to simulate runoff data with the same length of observed runoff data; (2) design flood estimations derived from runoff simulations and runoff observations are compared to analyse the feasibility of using runoff produced by hydrological models to estimate design flood; and (3) to evaluate the uncertainty caused by model equifinality by using the GLUE method, and the uncertainty induced by the difference of model calibration periods by using different sub-records of runoff observations to calibrate hydrological models.

## DATA AND METHODS

### Study area and data

The study was conducted in Xiangjiang basin, a tributary basin of the Yangtze River, which is located between 24–29 °N and 110 °30′–114 °E, central-south China with a total area of 94,660 km^{2} and total river length of 856 km. Mountainous landscapes are found in the east, south, and west of the basin. Its northern part is made up of plains and hills. Xiangjiang River originates from the southern mountainous region and runs to the north plains, and finally inlets into the Dongting Lake. The climate of this basin is controlled by the Mongolia high pressure system in winter and influenced by southeast monsoon in summer, which results in the inhomogeneous spatial and temporal distribution of precipitation. The mean annual precipitation is 1,450 mm, of which 60–70% occurs in the rainy season from April to September. The annual mean temperature is about 17 °C. The mean temperature of the coldest month (January) is about 4 °C. The study area is a sub-catchment of Xiangjiang basin with a drainage area of 52,150 km^{2}. The boundary of the study area is shown in Figure 1. It is located in the upper part of Xiangjiang basin with a control runoff station, Hengyang station, at the centre of Xiangjiang basin.

Forty-one years of daily hydro-meteorological data (from 1965 to 2005), including precipitation, pan evaporation and observed runoff, are used in this study. Among them, the precipitation data come from 97 precipitation stations, and the evaporation data come from 12 evaporation stations. The distribution of these stations is shown in Figure 1. The quality of these data is controlled by the Hydrology and Water Resources Bureau of Hunan Province, China. Many other studies (Xu *et al.* 2013, 2015a, 2015b) have used these data for various research purposes.

### Xinanjiang model

The Xinanjiang model was developed in the 1970s (Zhao *et al.* 1980). The main characteristic of this deterministic lumped model is the concept of runoff formation on repletion of storage, which means that runoff is not generated until the soil moisture content reaches filled capacity. This characteristic guarantees good performance of this model in continuous hydrological simulation in humid and semi-humid regions. The XAJ model has been successfully and widely used in China and many other countries for hydrological forecasting and flow simulations (Zhao 1992; Zhao *et al.* 1995; Li *et al.* 2013a, 2013b). The flowchart of this deterministic lumped model is shown in Figure 2. Symbols inside the boxes of Figure 2 are variables including model inputs, model outputs and state variables. The inputs to the model include daily precipitation, *P*, and measured pan evaporation, EM. The outputs are the simulated runoff, *Q*_{sim}, of the whole basin and the actual evapotranspiration, *E*, which contains three components, upper soil layer evaporation (EU), lower soil layer evaporation (EL), and deep soil layer evaporation (ED). Symbols outside the boxes of Figure 2 are parameters of this model and their explanations are listed in Table 1.

In the XAJ model, the catchment is represented by a stack of horizontal soil layers with total water storage capacity of WM. They are the upper soil layer, the lower soil layer, and the deep soil layer with certain water storage capacity, UM, LM, and DM, respectively. The potential evapotranspiration rate (EP) is derived by the product of the measured pan evaporation and the model parameter KE. Water stored in the upper layer is evaporated first with a rate of EU. If it is not able to meet the remaining evaporation capacity (EP-P), evaporation from the lower layer occurs at the rate of the remaining evaporation capacity (EP-P-EU) multiplied by the ratio between the lower layer water storage and its water storage capacity when this ratio is larger than parameter *C*. Otherwise the whole water stored in the lower layer is evaporated, and the evaporation from the deep layer occurs at the rate of the remaining evaporation capacity (EP-P-EU) multiplied by parameter *C* when water stored in the deep layer is sufficient enough or else all the water in deep layer is evaporated.

When the precipitation is larger than the potential evapotranspiration, runoff is generated in those areas whose soil water content reaches field capacity. The amount of runoff is derived according to the rainfall and soil storage deficit. The total runoff is then treated as the input to a free water reservoir whose storage capacity is non-uniformly distributed over the area. And its distribution is described by the parameter EX. Through this free water reservoir, the total runoff is subdivided into three components, the surface runoff, RS, the interflow, RI, and the groundwater runoff, RG. After that the surface runoff is routed to the outlet of the study area through a Nash–cascade model with parameters *n* and NK, while the interflow and the groundwater runoff is routed through single linear reservoirs with recession coefficients CI and CG, respectively. The sum of these three routing results is the simulated runoff of the XAJ model.

### HBV model

For comparison purpose, another widely used deterministic lumped conceptual model, the HBV model, which was developed by the Swedish Meteorological and Hydrological Institute in early 1970s, is used in this study. The version of the model used in this study is HBV light (Seibert 1997). The snow routine of HBV model is not considered in this study because snowfall is rare in the study region. The flowchart of this model is shown in Figure 3. Input data of this model are daily precipitation, *P*, and measured pan evaporation, EM. The output data include daily actual evapotranspiration, *E,* and runoff at the outlet of the study basin, *Q*_{sim}. Model parameters are in bold in Figure 3, and the meanings of these parameters are listed in Table 2.

In the HBV model, actual evaporation equals to the input evaporation if the ratio between soil moisture storage, SM, and field capacity, FC, is larger than parameter LP. Otherwise, a linear reduction is used to calculate the actual evaporation (Figure 3). The input precipitation *P* is divided into soil moisture storage and groundwater recharge according to the value of SM/FC. Groundwater recharge is then added to the upper groundwater box whose water storage is denoted as SUZ. The upper groundwater box has three outlets, the fast flow outlet (*Q0*), the interflow outlet (*Q1*) and percolation (PERC) to the lower groundwater box. *Q0* occurs only when SUZ is larger than a threshold UZL. The lower groundwater box has only one linear outflow, *Q2*. The sum of the three groundwater outflows, *Q0*, *Q1*, and *Q2* is then transformed by a triangular weighting function defined by the parameter MAXBAS to derive the simulated runoff.

### Model calibration method

The genetic algorithm is used to optimize the 15 free parameters of the XAJ model and the nine free parameters of the HBV model. For both models, the commonly used Nash–Sutcliffe coefficient (NS) and the relative error (RE) are used to evaluate the performance of the models. The functions are expressed as follows: 1 2where are observed, simulated and the mean of the observed discharge series, respectively.

The split-sample test (Klemeš 1986) is used to assess the performance of both models in simulating the rainfall-runoff relationship of the study area. In this test, these two models are calibrated and validated by using the 41-year historical runoff series of Hengyang runoff station. The 41-year historical runoff records are split into two sub-records, 1965–1985 and 1986–2005, for calibration and validation, respectively. When these models are used for design flood estimation, the 41-year observed data are all used for model calibration to achieve the best runoff simulation.

To evaluate the influence of using different model calibration period and calibration length on the results of design flood estimation, the 41 years of historical runoff records are separated into several sub-records (overlap exists) with the length of 2 years (40 sub-records), 5 years (37 sub-records), 10 years (32 sub-records), 15 years (27 sub-records) and 20 years (22 sub-records). Model parameters are calibrated based on these sub-records using genetic algorithm. Then the calibrated models are used to simulate the runoff series of the whole period, 1965–2005 for comparison.

### Pearson Type III distribution and L-moments approach

Annual maximum runoff series are selected from the 41-year daily runoff observations and model simulations to estimate design flood for a given return period. These annual maximum runoff series are analysed by fitting the Pearson Type III distribution. Pearson Type III distribution was first applied in hydrology by Foster (1924) to describe the probability distribution of annual maximum flood peaks (Chow *et al.* 1988). This distribution is recommended for official use for frequency analysis of annual maximum floods in China, is well tested and widely used across the country for design flood estimation (Hua 1987). The L-Moments approach as defined by Hosking (1990), is used to derive the three statistical parameters (the arithmetic mean *E _{x}*, the coefficient of variance

*C*, and the coefficient of skewness

_{v}*C*) of the annual maximum floods. Comparing with conventional moments, L-moments are more robust to the presence of outliers and can make more accurate inferences when the sample volume is small.

_{s}### The GLUE method

Resulting from the uncertainties that are inherent in the observed data, hydrological model parameters, model structure and the difference of model calibration periods, the design flood estimation will unavoidably have some uncertainty. The uncertainty caused by model parameter equifinality is evaluated by the GLUE method. This method was developed by Beven & Binley (1992). The main idea of this method is that there is no optimal model structure and model parameter set that can perfectly represent a river basin. Instead, this method keeps the idea that there are many combinations of parameter values that can reproduce observed runoff of a given river basin with the same efficiency (Freer *et al.* 1996; Beven & Freer 2001).

General steps of the GLUE method are as follows: Initially, a large number of model runs are conducted for a particular model with many model parameter sets randomly selected from subjectively determined parameter space with priori probability distribution. As no information is available about the parameter space, uniform distribution is chosen to be the priori distribution. Then the likelihood value of each model run is calculated by comparing the model simulations and observations. Higher likelihood values indicate better model simulation. After that, a cut-off threshold value is subjectively chosen. Model runs with likelihood values less than the cut-off threshold value are considered to be ‘non-behavioural’ and will not be considered in the further analysis. Finally, the 95% confidence interval of the runoff simulations can be drawn from the remaining ‘behavioural’ model simulations regarding their likelihood values as relative weights. The annual maximum discharge series are selected from the ‘behavioural’ model simulations. Then Pearson Type ΙΙΙ distribution is fitted to each series, and the 95% confidence intervals of the design floods of different return periods are derived considering the likelihood function value as weighting factor.

The choice of likelihood function has been discussed extensively (Beven & Binley 1992; Romanowicz *et al.* 1994) since the outcome of the GLUE method. In this study, the standardized Nash–Sutcliffe value is chosen to be the likelihood measure.

Two indices are used to give a quantitative evaluation of the GLUE simulation, the percentage of observations that are contained in the 95% confidence interval (P-95CI) (Li *et al.* 2009), and the average relative interval length (ARIL) proposed by Jin *et al.* (2010). These two indices are expressed as follows:
3where is the number of observations contained in the 95% confidence interval; *N* is the total number of observations.
4where and are the upper and lower boundary of the 95% confidence interval, respectively, and is the runoff observation.

## RESULTS AND DISCUSSION

### Model calibration results

Table 3 summarizes the split-sample test results. It is seen from Table 3 that both models have excellent performance in this study area in the calibration period. Somehow reduced performance is seen for both models in the validation period in terms of NS and RE values. Considering the total simulation period, the Nash–Sutcliffe value is larger than 0.83 for both models, and the absolute value of RE of XAJ model is less than 11.56% and that of HBV model is less than 14.06%. This indicates that both models performed equally well in representing the flow characteristics of the study area.

When the 41-year observed data are all used for model calibration, the Nash–Sutcliffe value of XAJ model is 0.88 and that of the HBV model is 0.87; the RE of the XAJ model is 0.01%, and that of the HBV model is 5.23%. This indicates that both models are well calibrated and give good simulation results. As the runoff simulations are used for flood frequency analysis, the daily flow duration curves of the runoff observations and runoff simulations are drawn and shown in Figure 4(a). As the annual maximum runoff series are analysed to estimate design flood, special attention was given to the simulation of the annual flood peaks. The 41 simulated annual flood peaks were selected and plotted against the corresponding observed values to evaluate the deviation of the simulated flood peaks. The result is shown in Figure 4(b). It is seen that the difference between the flow duration curve of the runoff observations and that of the runoff simulations is small except in the left hand tail, as expected. The mismatches to the extreme left are mainly because of the imperfection of the models in simulating floods with high return periods as well as the errors in the model input data. Figure 4(b) further shows that the XAJ model is likely to overestimate high flood peaks but underestimate low flood peaks, and that HBV model underestimates both high flood peaks and low flood peaks. The impact of these imperfect simulations of peak floods on the estimation of design flood is analysed in the next section.

### Design flood estimation

Annual maximum runoff series of observations and simulations are selected. Their statistical parameters, the arithmetic mean *E _{X}*, the coefficient of variance

*C*and the coefficient of skewness

_{v}*C*, are calculated using the L-moments approach. The statistical parameters of observed runoff are chosen to be the reference values. The results are listed in Table 4, which shows that the mean value of annual flood peaks is underestimated by both models. The coefficient of variance is overestimated by both models due to the errors that are inherent in the simulated peaks. The

_{s}*C*value is overestimated by XAJ model, and underestimated by HBV model. Just from these three statistical parameters, it seems XAJ model gives better simulation of annual flood peaks than HBV model.

_{s}Parameters of Pearson Type ΙΙΙ distribution are calculated based on these statistical parameters. Then this distribution is fitted to the annual maximum runoff series. The results are shown in Figure 5(a). The estimates of design flood of different return periods are listed in Table 5 and shown in Figure 5(b) to give a clear comparison. Since the length of available data is 41 years, design flood estimates are presented up to 100 years only.

Figure 5(a) reveals that the XAJ model is likely to overestimate the design floods of long return period, while HBV model underestimates them, though both models are well calibrated based on the same hydrometeorological data by genetic algorithms. Figure 5(a) also shows that the slope of the frequency curve derived from XAJ model simulation is larger than the other two curves, and that the frequency curve derived from HBV model simulation lies under the frequency curve of observed annual maximum runoff series though their slopes are similar. The reason is that, for the XAJ model, though it gives good estimation of the mean value of annual flood peaks, it overestimates the *C _{v}* value; for the HBV model, it gives a better estimation of

*C*, however it underestimates mean value (Table 4). Figure 5(b) and Table 5 show that, for a low return period, the difference between the design flood estimates of the observed series and that of the simulated series is small, while this difference increases with the increase of return period. Figure 5(b) and Table 5 also demonstrate that the absolute relative difference of design floods estimated by the XAJ model is less than that estimated by HBV model, especially for a low return period. Table 5 shows that, for both models, the absolute values of relative difference of the design flood estimate increases with the increase of return period, and that their maximum value is less than 11%. The difference between the design flood estimates derived from runoff observations and simulations is an illustration of the uncertainties that are inherent in the continuous simulation approach. These uncertainties are further analyzed in the next section.

_{v}### Uncertainty caused by model parameter equifinality

No ‘optimal’ parameter set could be found in hydrological modelling practice, and many parameter sets may result in similar results, which is the well known equifinality problem in hydrological modelling. In this study, the effect of the equifinality problem on flow simulation as well as on the design flood estimation is studied by using the GLUE method. The likelihood threshold value required in the GLUE method is subjectively determined to be 0.8. To reduce the number of total model runs, only sensitive parameters (KE, SM, KG, CI, CG, and NK) of XAJ model are considered to be free parameters, other parameters retain their optimized values derived in the section, ‘*Model calibration results*’. In this study area, it is unnecessary to consider snow routine of HBV model, so the number of free parameters of HBV model reduces to 9. The ranges of free parameter for both models when applying the GLUE method are listed in Tables 1 and 2, and 100,000 times of model runs are conducted. After selection, 12,849 times of XAJ model runs and 14,258 times of HBV model runs are considered to be ‘behavioural’ and are used to derive the 95% confidence interval of runoff simulation and that of design flood estimation considering the likelihood value to be the weighting factor. For illustrative purposes, the 95% confidence intervals of the flood year (1978), the normal year (1987), and the dry year (1979) are plotted in Figure 6.

Figure 6 shows that, for both models, 95% confidence interval contains most of the runoff observations and no significant difference between the HBV model simulation and XAJ simulation can be found. Table 6 lists the values of indices used for assessing the 95% confidence interval. It also shows that the P-95CI and ARIL of both models are rather close, with HBV model having slightly higher values of P-95CI.

The 95% confidence intervals of the design floods estimation of both models are shown in Figure 7 and the relative interval lengths of the 95% confidence intervals are listed in Table 7. Figure 7 shows that for XAJ model, the confidence interval is almost symmetrically distributed around the frequency curve of the runoff observations which is quite close to the interval median. However, for the HBV model, most of the confidence interval is beneath the frequency curve of the runoff observations, which indicates that the HBV model is likely to underestimate the hydrological extreme values in this study basin. Table 7 shows that considering the same return period, the XAJ model gives a wider 95% confidence interval though there are less ‘behavioural’ model simulations of the XAJ model. It also shows that for both models the relative interval length is quite large even for the return period of just 2 years, which reveals that the uncertainty of model parameters and model structure have a great impact on the results of design flood estimation by continuous simulation.

Figure 8 shows the 95% confidence intervals of different return periods. It reveals that the interval length increases with the increase of the return period and for each return period the interval length derived from the XAJ model simulation is larger than that derived from HBV simulation. In summary, no matter how well the model parameters are calibrated, the frequency curve derived from the deterministic model simulations is only a single realization of the process to be studied and not reliable due to the significant influence of the model equifinality.

### Uncertainty caused by using different model calibration period and data length

Figure 9 demonstrates the frequency curves derived from the runoff observations and runoff simulations with different model calibration periods and calibration data length. It is seen from Figure 9 that a wide spread of design flood estimates is produced by using the different calibration periods and data lengths. With the same calibration data length, the spread of design flood estimates of XAJ model is larger than that of the HBV model, which indicates XAJ model is more sensitive to the difference of the calibration period when used for design flood estimation by continuous simulation. It should also be noted that the XAJ model is likely to overestimate the design floods of longer return period while HBV model will underestimate the design floods. Figure 9 also shows that the value of exceedance probability has no significant influence on the band width compared with Figure 7, though the band is relatively a little thinner in the middle and wider in the tails. This means the uncertainty of model parameters has a greater influence on the value of design flood estimation with a high return period but less influence on the design flood estimation of low return period. However, the influence of the difference of model calibration period on design flood estimation does not significantly vary with the return period of design flood. This phenomenon can be quantitatively demonstrated in Table 8.

It is seen from Table 8 that the relative band widths derived from XAJ model simulations are larger than that from HBV model simulations. Comparing Table 8 and Table 7, it is clear that for a large return period, the uncertainty caused by model parameter equifinality is more significant than the uncertainty caused by using different model calibration periods and data length, when applying hydrological models to design flood estimation by continuous simulation.

The change of band width of design flood estimation with data length of model calibration is visually demonstrated by Figure 10. It shows that the band width of design flood with a certain return period decreases with the increase of model calibration data length, which indicates the increase of model calibration data length will reduce the uncertainty caused by the difference of model calibration period. However, when the model calibration length is larger than a certain value (between 10 to 15 years), the further increase of calibration data length will not result in a significant reduction of the uncertainty caused by using different model calibration periods, especially in the HBV model with less parameters. For the study area, 15–20 years' calibration data length for the conceptual hydrological models seems necessary and sufficient, and is recommended in design flood estimation by continuous simulation to minimize the uncertainty caused by the difference of model calibration period.

The above analyses demonstrate that the part of the data used for model calibration has significant influence on further design flood estimation, though it is not as significant as the uncertainty caused by model parameter equifinality. And once the hydrological model is well calibrated the further increase of calibration period length beyond a certain threshold has little improvement on design flood estimation.

## CONCLUSIONS

In this study, two hydrological models are used for the design flood estimation by continuous simulation. Uncertainty that is inherent in this methodology caused by model parameter equifinality is further analysed by using the GLUE method. Uncertainty caused by using different model calibration periods and calibration data length is also assessed. The study was conducted in a humid catchment in southern China. The following conclusions are drawn from the study.

Both models can give design flood estimations similar to those derived from runoff observations with maximum absolute RE of 11% for the design flood with a return period less than 100 years.

The large width of the 95% confidence interval of the flood frequency curve indicates significant uncertainty caused by model parameter equifinality. This means no matter how well the model parameters are calibrated, the frequency curve derived from the deterministic model simulations is only a single realisation of the possible values and is not reliable due to the significant influence of the model equifinality.

Different model calibration periods generate a wide spread of design flood estimation and that the error band width varies little with the exceedance probability, which indicates that the uncertainty induced by the difference in the model calibration period will have significant influence on the estimation of design flood of all return periods. The study also shows that once the model calibration data length is long enough for model calibration, the further increase of calibration data length has little improvement on design flood estimation.

This study systematically investigates the applicability of hydrological models to design flood estimation by continuous simulation. The results show that great uncertainty exists in this methodology. When applying this method to design flood estimation, the uncertainty should be treated seriously, or significant error may exist in the estimation results. However, this study is only a preliminary study because only two deterministic lumped models are used and only one runoff station in a humid region is considered. More studies need to be done on other river basins using other hydrologic models to draw a generalised conclusions and guidance for design flood estimations using hydrological models.

## ACKNOWLEDGEMENTS

The study was financially supported by the National Natural Science Fund of China (51190094; 51339004; 51279138).

- First received 30 March 2015.
- Accepted in revised form 26 August 2015.

- © IWA Publishing 2016

Sign-up for alerts