Escolar Documentos
Profissional Documentos
Cultura Documentos
net/publication/248807556
CITATIONS READS
331 726
1 author:
Donald H. Burn
University of Waterloo
163 PUBLICATIONS 6,571 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Donald H. Burn on 24 March 2014.
Evaluation
ofRegional
Flood
Frequency
Analysis
Witha Region
of Influence Approach
DONALD H. BURN
A novel approach to regional flood frequency analysis is presented and evaluated. The technique is
referred to as the region of influence approach in that every site can have a potentially unique set of
gauging stations for use in the estimation of at-site extremes. The rationale for the methodology is
discussed,and several options for incorporating the approach into regional flood frequency analysis
are developed and compared with traditional regional estimation procedures. Through a Monte Carlo
experiment, the region of influence approach is demonstratedto provide improved at-site estimates of
extreme flow quantliesin terms of network averageroot mean squarederror and comparableresults
for bias. The method is further shown to have attractive features for estimating extremes for unusual
sites in a network of gauging stations.
where TP and n are parameters of the weighting function. is used in conjunction with the method of probability
Option 1 requires the definition of an upper and lower weighted moments (PWM) since this combination was iden-
threshold value, a target number of stationsfor the RaT, and tified by Potter [1987] as an efficient basis for combining
the parameters of the weighting function. extreme flow data. Other distribution functions and param-
eter estimation techniques could be employed possibly ne-
Option 2 cessitatingrevised procedures for effecting the spatial data
transfer.
This option has a constant threshold value which is given The GEV distribution is given as
as
Option 3 T}
=•5•
tk,jnpj•lij
/ Znpj•lij
k=1,2 (18)
The final option considered involves including all sites in
the RaI such that the threshold is defined as whereI i is the setof stationsin the RaI for sitei, andnpj is
the number of years of record for stationj. The index i on the
Oi = TLi (12) regionalized PWM indicates the site for which the weighted
PWM is calculated. From the weighted PWM values for each
and the weighting function is defined through (8)-(11). This RaI, the three parameters of the GEV distribution can be
option requires the specification of a threshold parameter estimated via [Hosking et al., 1985]
and two weighting function parameters.
The first option presented above entails including a limited c = (2T•- 1)/(3T}- 1)- log(2)/log(3) (19)
number of stations in the RaT for each site. The resulting
stationsare then expected to be very similar in extreme flow g = 7.859c+ 2.955c2 (20)
response to the site of interest. Options 2 and 3 represent
variations on the contrastingapproachto RaT formulation in c•= (2T•- 1)g/{F(1
+ g)(1- 2-g)} (21)
that a comparatively large number of stationsare included in
se = 1 + c•{r(1 + g)- 1}/g (22)
each ROT (for option 3, all stations) and the weighting
function is then used to reflect the relative proximity of where F( ) signifiesthe gamma function and each parame-
stations. Methods of selecting parameter values for each of ter shouldbe regarded as having an index i associatedwith it
the options, in keeping with the modeling philosophies referring to the station for which the set of parameters were
indicated above, will be discussedin a subsequentsegment calculated. Hosking et al. [1985] indicate that the above
of this paper. equationsprovide satisfactory parameter estimates for -0.5
With the definition of the station membership for each -< g -< 0.5. A dimensionless growth curve can then be
region of influence and the determination of the weight estimated from
assignedto each station in the RaT, it is possibleto estimate
at-site extremes incorporating information from all stations
in the RaT. The methodology for combining information
from all of the included stations will necessarily be some- ß
+ [_,og
(,__;)
}
what specific to the distribution function selected for ex- where x• is the estimate of the dimensionlessT-year flow
treme flows and the parameter estimation technique used. In value for site i. An estimate of the T-year event for any site
this work, the generalized extreme value (GEV) distribution can be obtained from
2260 BURN' REGIONALFLOODFREQUENCYANALYSIS
X•= M•c• (24) stationsin the data network. As previously noted, all of the
attributes should be related to the extreme flow responseat
whereX• is theT-yearflowat sitei, andM• is themean the station, but, as well, reliable estimates of the attribute
annual flood for site i. values should be obtainable from the available data base
which comprises the annual flow record and limited infor-
MONTE CARLO EXPERIMENT mation describingphysical features of the contributing drain-
age area. The set of candidate attributes consisted of the
Experimental Design following: (1) the coefficient of variation (CV) of the annual
flow series, (2) a plotting position estimate of the 10-year
A common problem in evaluating methods for estimating
flood quantiles is that the available data record represents flow event (Q10) interpolated from the available annual flow
series, (3) a variation on the Pearson skewness (PS) measure
only one realization of what could be regarded as a stochas-
defined as
tic flood generation process. The "true" value for any flood
quantile at a particular location is therefore inherently un-
knowable. Thus one must often resort to Monte Carlo PS = • (25)
sampling to evaluate the relative performance of different
flood estimators. A disadvantage of this approach arises where Ix is the mean, m is the median, and cris the standard
from the need to specify the form and parameters of the deviation, of the annual flow series, (4) the skewness coef-
parent extreme flow distribution for all sites. To avoid ficient (SK) of the annual flow series with a bias correction
arbitrariness in this process, it is essentialthat the selected for data set length [Kite, 1977], and (5) the drainage area
parent distributions be representative of conditions that (DA) of the basin contributing to the flow at the gauging
could occur (see, for example, Lettenmaier et al. [1987]). station.
One possible approach to selectingrealistic parent distribu- From the candidate attributes, a reduced set of attributes
tion function characteristics is to allow the available data
were identified by comparing the attributes with an at-site
record to suggestparent distributions for each station under estimate of the 100-year flow event obtained from the annual
consideration. This approach, previously used by Burn flow series assuming the GEV distribution. Two pieces of
[ 1988], involves calculating distribution parameters for every information are required from the selection process. The set
site, and setting the "parent" parameters equal to the of attributes to include must be identified and a relative
calculated values. Sample sets of annual flow data are then importance must be assigned for each attribute. To accom-
generated for each site using the assumed parent parameter plish both of these tasks, the correlation between the 100-
values and the length of record observed at the site. Gener- year event and each attribute was calculated, which led to
ated data sets for all sites are then used to evaluate the
the selection of CV, Q10, and PS as the attributes. The
performance of the ROI options outlined in the previous selected weighting values corresponded to the observed
section and to compare this approach with a traditional correlation between the attribute and the extreme flow
regionalization procedure involving fixed regions and also estimate. In addition to calculating correlations, the three
with results from using all available stations. selected attributes were confirmed as desirable measuresby
The settingup of the Monte Carlo experiment involves the plotting the attribute values versus the extreme flow esti-
following steps:(1) Choose a set of gaugingstations,and for mates. The resulting plots indicated essentially linear rela-
each station in the network, estimate the at-site parameters tionships for each of the selected attributes and were also
for the GEV distribution; these parameters values will define useful for identifying "unusual" stations which showed up
the parent distribution for the station. (2) Select a set of as outliers on the scatter diagrams.
attributes to define station similarity following the procedure
outlined below. (3) Determine parameter values for each of
the ROI options based on the station similarity measure and Parameters for ROI Options
the emphasis of the particular option. (4) Determine charac-
The parameters for the ROI options are selected consid-
teristics of the traditional regionalization approachfollowing
ering the philosophyincorporated into the individual options
the approach of Burn [1989].
and the characteristics of the gauging stations that make up
the data network. An important part of the basisfor choosing
Data Set Network Description parameter values is the matrix of distance metric values
The data set used to evaluate the ROI technique consisted which contains the weighted distance from every station to
of 45 gauging stations located on natural rivers in southern every other station. While the diagonal elements of this
Manitoba. The drainage area for the sites included in the matrix are zero, the terms above the diagonal (or the terms
networkrangedfrom46to 4200km2 witha medianvalueof below the diagonal) include all observed nonzero distance
414 km2. The numberof yearsof recordat the gauging values. The elements above the diagonal were sorted by
stations ranged from 20 to 42 years with a median value of 25 magnitude, resulting in somethinganalogousto a distribution
years. Further information on the stations used to define the for weighted distance values between station pairs. Thus one
parent distributions for the network of stations is summa- could determine the median distance, the largest or smallest
rized in Table 1. distance, or a particular percentile of the distance values
_ (i.e., thee90th percenti!e distance value wouldAmply that
Attribute
Selection only 10%of thedistance values weregreater thanthe
selected value). Selected percentiles of the sorted distance
All of the attributes consideredas candidatesfor inclusion values were used.asa guideline for selectingthresholdvalues
in the distance metric must be readily available for all for the ROI options. The chosen percentiles acted as a
BURN: REGIONAL FLOOD FREQUENCY ANALYSIS 2261
guideline only because intuitively the threshold values that are reasonably similar to the target station so the
should correspond to breakpoints in the array of distance weighting function values should be substantially different
values. The procedure was thus to choose a particular from zero even at the upper threshold. To incorporate this
percentile of the distancedistribution and then look for a gap behavior, the value of n was set to 2.5 and TP was given a
in the distribution close to the selectedpercentile. Although value corresponding to the 85th percentile of the distance
an element of judgment is required to select the percentile value distribution. In contrast, the weighting functions for
values to use for a particular threshold, preliminary investi- options 2 and 3 should give comparatively low weights to
gations have indicated that the methodology is not overly stations at the threshold since both of these options entail
sensitive to the values selected for the thresholds. including a large number of stationsin the ROI. As such, the
For the data set examined, the lower threshold parameter, value of n was set to 0.10 and TPP was also set to the 85th
0L, was set equal to the 25th percentile while the 75th percentile of the distance value distribution.
percentile was selectedfor the upper thresholdvalue, 0t•.
The target number of stations, NST, for an ROI was set at 15
Parameters for Identifying Fixed Regions
(one third of the available stations). The percentiles and
target values selected reflect the diversity of the stations that
The traditional regionalization approach, used as one
constitute the network for this data set. The parameter benchmark for comparing the performance of the ROI op-
valuesfor the weighting function,•/ij, werechosenconsid- tions, was based on clustering in the three-dimensional
ering the modeling approach taken with the individual op- attribute spaceused to define the distance metric for the ROI
tions. Option 1 involves an ROI containing only those sites approach. Within the clustering approach, several pieces of
2262 BURN' REGIONAL FLOOD FREQUENCY ANALYSIS
information are. required [Burn, 1989]. Each attribute form- TABLE 2. Performance Measures and Standard Errors (in
ing part of the distance metric requires a weight, reflecting Parentheses)
the relative importance of the attribute in defining basin Return Period, years
similarity. The correlation coefficientbetween each attribute
and the 100-year event was used as a weighting value in a Option 25 50 100 200
similar approach to that used to assignrelative importanceto RMSE
the attributes with the ROI technique. The number of 1 0.103 0.143 0.189 0.241
regions for the 45 stations in the network was set at three (0.00203) (0.00246) (0.00335) (0.00479)
following results from Burn [1989]. The division of the 2 O.907 O. 138 O. 185 0.240
stations based on attribute values derived from the at-site
(0.00165) (0.00227) (0.00334) (0.00488)
parent distribution function resulted in three regions which
3 0.089 0.123 0.161 0.203
each passed the regional homogeneity test described by
(0.00117) (0.00186) (0.00255) (0.00349)
Wiltshire [ 1986] indicating that the traditional regionalization
R 0.125 0.179 0.240 0.308
approach represents a reasonable data partitioning. To fur-
ther evaluate the fixed regions, a convenient, but simple, (0.00191) (0.00279) (0.00401) (0.00562)
measure of regional heterogeneity is the normalized regional R1 0.142 0.188 0.248 0.298
range in the CV values [Lettenmaier et al., 1987]definedas (0.00075) (0.00162) (0.00240) (0.00342)
BIAS
•(CV) 1 -0.025 -0.031 -0.035 -0.036
R*(CV) = • (26) (0.00062) (0.00089) (0.00119) (0.00152)
M(CV)
2 -0.014 -0.015 -0.014 -0.010
where R(CV) is the range of CV values for the region and (0.00060) (0.00089) (0.00121) (0.00156)
M(CV) is the median CV value for the region. Calculating
3 -0.006 -0.004 0.000 0.006
the normalized regional range for the three parent regions
(0.00062) (0.00094) (0.00130) (0.00168)
resulted in values of 0.356, 0.351, and 0.401 for regions 1, 2,
R -0.015 -0.015 -0.010 -0.001
and 3, respectively. The stations are listed in Table 1
(0.00059) (0.00089) (0.00123) (0.00161)
according to parent region membership with the first 19
stations constitutingregion 1, the next 14 comprisingregion R1 0.000 0.005 0.012 0.023
2, and the final 12 stations making up region 3. The normal- (0.00062) (0.00095) (0.00132) (0.00172)
ized regional range for the entire network of 45 stationsis
0.909 indicating again that the regionalization processyields
a reasonable partitioning of the stations.
stations
wereassigned
weighting
values,r/•/,of unityfor the
Simulation of Data Sets combination of at-site PWMs through (18). All of the options
were evaluated in terms of root mean squared error (RMSE)
With the characteristics of the various regionalization and bias as defined through
options defined, it is possible to generate data sets and
evaluate the approaches.To accomplishthis, 1000 samples 1 1 Q•-Q
of extreme flow data for each site were generated in accor-
dance with the parent parameter values and the number of RMSEr
=• k=l • /=1 Q•r (27)
years of annual flow data at the site. For each of the 1000
samples, it was necessaryto complete the following steps:
(1) Define the region of influence and weighting function
values for each station and each option in accordance with BIASr
=•-• • k=• j=• • Q•r (28)
the procedures outlined above. (2) Determine at-site esti-
mates for extreme flow quantiles for every site and compare
where RMSEr is the root mean squared e•or for return
with theoretical values. (3) Determine traditional fixed re- period T, NS is the number of sites in the data set, N is the
gions and estimate at-site extremes for every site based on
number of MonteCarlosamples, Q• is theestimate
forthe
regionalgrowth curvesand comparewith theoreticalvalues.
T yeareventat sitek forsample
j, Q• isthetheoretical
value
(4) Estimate at-site extremes using all 45 stations in the
for the T year event at site k, and BIASr is the averagebias
for return period T. The two performance measures were
network and again compare with the theoretical values.
calculated for each of the alternatives with the results
summarized in Table 2, along with standard e•ors for each
PRESENTATION OF RESULTS
performance measure estimate.
The relative performanceof the regionof influenceoptions The results in Table 2 indicate that the ROI options are
outlined above was evaluated in terms of measures of the uniformly better than the regional estimators in terms of
accuracy and precision of the estimatesof at-site extreme RMSE. In comparing ROI options, the close agreement
flow quantiles. In addition, the performance of the ROI between options 1 and 2, pa•icularly on the RMSE measure,
options was compared with the performance of the tradi- is interestingin that the two optionsrepresent very different
tional regionalestimatorbasedon clusteringand the estima- philosophiesfor defininga region of influence. In option 1,
tion usingatlwvaiI•he• is a regio• •n•y thosesitesthatarerelativelyclosein attributespaceto
estimatorwith one region.The regionalestimatorsinvolved the candidatesite are includedin the ROI, whereasoption2
includingall the stationsin each region for the at-site seeksto includemany siteswith correspondingly reduced
estimationof extremesfor every site in the region. All weightingsfor the moreremotesites.The resultsin Table 2
BURN: REGIONAL FLOOD FREQUENCYANALYSIS 2263
CONCLUSIONS
DISCUSSION
wouldbe identical
to thatoutlined
herein,butthesetof Hosking, J. R., J. R. Wallis, and E. F. Wood, Estimation of the
candidate attributes would be limited to measuresnot relying generalizedextreme-valuedistributionby the methodof proba-
on annual flow data.
bility-weighted
moments,Technometrics,
27(3),251-261,1985.
Kite, G. W,, Frequency and Risk Analysis in Hydrology, Water
ResourCesPublications, Littleton, Colo., 1977.
REFERENCES Lettenmaier, D, P., J. R. Wallis, and E. F. Wood, Effect of heter-
ogeneity on flood frequency estimation, Water Resour. Res.,
Acreman, M. C., Regional flood frequency analysisin the U.K.: 23(2), 313-323, 1987.
Recentresearch-newideas,report, Inst. of Hydrol., Wallingford, Potter, K. W., Research on flood frequency analysis: 1983-1986,
United Kingdom, 1987. Rev. Geophys., 25(2), 113-!18, 1987.
Acreman, M. C., and S. E. Wiltshire, Identificationof regionsfor Tasker, G. D., Comparingmethodsof hydrologicregionalization,
regional flood frequency analysis,(abstract), Eos Trans. AGU, Water Resour. Bull., 18(6), 965-970, 1982.
68(44), 1262, 1987. Wiltshire, S. E., Regionalfloodfrequencyanalysis,I, Homogeneity
Burn, D. H., Delineation of groupsfor regional flood frequency statistics,Hydrol. Sci. J., 31(3), 321-333, 1986.
analysis,J. Hydrol., 104, 345-361, 1988.
Burn, D. H., Clusteranalysisas appliedto regionalfloodfrequency, D.H. Burn, Department of Civil Engineering, University of
J. Water Resour. Plann. Manage. Div. Am. Soc. Civ. Eng., Manitoba, Winnipeg, Manitoba, Canada R3T 2N2.
115(5), 567-582, 1989.
Cook, B. G., P. Laut, M.P. Austin, D. N. Body, D. P. Faith, M. J.
Goodspeed, andR. Srikanthan,Landscapeandrainfallindicesfor (Received November 15, 1989;
predictionof streamflowsimilaritiesin the Hunter Valley, Aus- revised May 11, 1990;
tralia, Water Resour. Res., 24(8), 1283-1298, 1988. accepted May 24, 1990.)