Escolar Documentos
Profissional Documentos
Cultura Documentos
Advisors:
R.-D. Reiss
Approximate Distributions
of Order Statistics
With Applications to N onparametric
Statistics
With 30 Illustrations
Springer-Verlag
New York Berlin Heidelberg
London Paris Tokyo
R.-D. Reiss
Universitat Gesamthochschule Siegen
Fachbereich 6, Mathematik
D-5900 Siegen
Federal Republic of Germany
Mathematics Subject Classification (1980): 62-07, 62B15, 62E20, 62G05, 62G10, 62G30
Library of Congress Cataloging-in-Publication Data
Reiss, Rolf-Dieter.
Approximate distributions of order statistics.
(Springer series in statistics)
Bibliography: p.
Includes indexes.
1. Order statistics. 2. Asymptotic distribution
(Probability theory) 3. Nonparametric statistics.
I. Title. II. Series.
QA278.7.R45 1989
519.5
88-24844
Printed on acid-free paper.
9 876 54 32 1
ISBN-13:978-1-4613-9622-2
e-ISBN-13:978-1-4613-9620-8
DOl: 10.1007/978-1-4613-9620-8
Preface
viii
Preface
and topics that are taught in introductory probability and statistics courses
are necessary for the understanding of this book. To reinforce previous knowledge as well as to fill gaps, we shall frequently give a short exposition of
probabilistic and statistical concepts (e.g., that of conditional distribution and
approximate sufficiency).
The results are often formulated for distributions themselves (and not only
for distribution functions) and so we need, as far as order statistics are
concerned, the notion of Borel sets in a Euclidean space. Intervals, open sets,
and closed sets are special Borel sets. Large parts of this book can be understood without prior knowledge of technical details of measure-theoretic
nature.
My research work on order statistics started at the University of Cologne,
where influenced by J. Pfanzagl, I became familiar with expansions and
statistical problems. Lecture notes of a course on order statistics held at the
University of Freiburg during the academic year 1976/77 can be regarded as
an early forerunner of the book.
I would like to thank my students B. Dohmann, G. Heer, and E. Kaufmann
for their programming assistance. G. Heer also skillfully read through larger
parts of the manuscript. It gives me great pleasure to acknowledge the cooperation, documented by several articles, with my colleague M. Falk. The
excellent atmosphere within the small statistical research group at the University of Siegen, and including A. Janssen and F. Marohn, facilitated the writing
of this book. Finally, I would like to thank W. Stute, and those not mentioned
individually, for their comments.
Siegen, FR Germany
Rolf-Dieter Reiss
Contents
Preface
vii
CHAPTER 0
Introduction
0.1.
0.2.
0.3.
0.4.
0.5.
1
3
4
5
6
7
8
PART I
11
1.1.
1.2.
1.3.
1.4.
1.5.
1.6.
1. 7.
1.8.
P.1.
11
14
20
27
32
36
44
51
56
61
Contents
CHAPTER 2
64
2.1. Introduction
2.2. Distribution Functions and Densities
P.2. Problems and Supplements
Bibliographical Notes
68
78
81
64
CHAPTER 3
83
3.1.
3.2.
3.3.
P.3.
83
89
94
102
104
PART II
Asymptotic Theory
CHAPTER 4
107
108
114
123
129
131
138
142
145
148
CHAPTER 5
151
152
164
176
181
186
190
194
201
Contents
xi
CHAPTER 6
206
6.1.
6.2.
6.3.
6.4.
P.6.
206
209
216
220
226
227
CHAPTER 7
229
229
232
237
238
PART III
243
8.1.
8.2.
8.3.
8.4.
P.8.
243
248
260
265
268
270
Sample Quantiles
Kernel Type Estimators of Quantiles
Asymptotic Performance of Quantile Estimators
Bootstrap via Smooth Sample Quantile Function
Problems and Supplements
Bibliographical Notes
CHAPTER 9
272
9.1.
9.2.
9.3.
9.4.
9.5.
9.6.
9.7.
P.9.
273
276
279
281
283
284
286
289
290
CHAPTER 10
292
292
299
xii
Contents
305
310
315
317
318
321
325
Bibliography
Author Index
Subject Index
331
345
349
CHAPTER 0
Introduction
Let us start with a detailed outline of the intentions and of certain characteristics of this book.
00,
(1)
00.
In this sequel let us assume that Fo is continuous. It follows from (1) that
n -+
00,
(2)
uniformly over all intervals 1. In general, (2) does not hold for every Borel set
1. However, if the drs Fn have densities, say, f,. such that f,.(t) -+ fo(t), n -+ 00,
for almost all t, then it is well known that (2) is valid w.r.t. the variational
distance, that is,
o. Introduction
sup IPgn
B
B} - Pg o E B} 1--+ 0,
n --+
(3)
00,
(4)
00,
for every t, with ~o denoting a standard normal r.v. and an, bn are normalizing
constants. The two classical methods of proving (4) are
(a) an application of the central limit theorem to binomial r.v.'s,
(b) a direct proof of the pointwise convergence of the corresponding densities
(e.g. H. Cramer (1946)).
However, it is clear that (b) yields the convergence in a stronger sense,
namely, w.r.t. the variational distance. We have
sup IP{a;;-l(Xr (n),n - bn) E B} - Pg o E B}I--+ 0,
n --+
00,
(5)
where the sup is taken over all Borel sets B. A more systematic study of the
strong convergence of distributions of order statistics was initiated by L. Weiss
(1959, 1969a) and s. Ikeda (1963). These results particularly concern the joint
asymptotic normality of an increasing number of order statistics.
The convergence of densities of central order statistics was originally
studied for technical reasons; these densities are of a simpler analytical form
than the corresponding dJ.'s. On the other hand, when treating weak convergence of extreme order statistics it is natural to work directly with dJ.'s. To
highlight the foregoing remark the reader is reminded of the fact that F n is
the dJ. of the largest order statistic (maximum) Xn,n of n independent and
identically distributed r.v.'s with common dJ. F.
The, meanwhile, classical theory for extreme order statistics provides necessary and sufficient conditions for adJ. F to belong to the domain of attraction
of a nondegenerate dJ. G; that is, the weak convergence
n --+
(6)
00,
holds for some choice of constants an > and reals bn. If F has a density then
one can make use of the celebrated von Mises conditions to verify (6). These
conditions are also necessary for (6) under further milder conditions imposed
on F. In particular, the drs treated in statistical textbooks satisfy one of the
von Mises conditions. Moreover, it turns out that the convergence w.r.t. the
variational distance holds. This may be written,
sup IP{a;;-l(Xn,n - bn) E B} - G(B)I--+ 0,
--+ 00,
(7)
where the sup is taken over all Borel sets B. Note that the symbol G is also
0.2. Approximations
used for the probability measure corresponding to the dJ. G. Apparently, (7)
implies (6).
The relation (7) can be generalized to the joint distribution of upper
extremes X n - k +1 :n' X n -k+2 :n' .. , Xn:n where k == k(n) is allowed to increase
to infinity as the sample size n increases.
We want to give some arguments why our emphasis aims at the variational
and Hellinger distance instead of the Kolmogorov-Smirnov distance:
(a) We claim mathematical reasons, namely, to formulate as strongly as
possible the results. One can add that the problems involved are very
challenging.
(b) Results in terms of dJ.'s look awkward if the dimension increases with the
sample size. Of course, the alternative outcome is the formulation in terms
of stochastic processes.
(c) It is necessary to use the variational distance (and, as an auxiliary tool,
the Hellinger distance) in connection with model approximation. In other
words, certain problems cannot be solved in a different way.
0.2. Approximations
The joint distributions of order statistics can explicitly be described by analytical expressions involving the underlying dJ. F and density f. However, in
most cases it is extremely cumbersome to compute the exact numerical values
of probabilities concerning order statistics or to find the analytical form of
d.f.'s of functions of order statistics. Hence, it is desirable to find approximate
distributions. In view of practical and theoretical applications these approximations should be of a simple form.
The classical approach of finding approximate distributions is given by the
asymptotic theory for sequences of order statistics Xr(n):n with the sample size
n tending to infinity:
(a) If r(n) ~ 00 and n - r(n) ~ 00 as n ~ 00 then the order statistics are
asymptqtically normal under mild regularity conditions imposed on F.
(b) If r(n) = k or r(n) = n - k + 1 for every n with k being fixed then the order
statistics are asymptotically distributed according to an extreme value
distribution (being unequal to the normal distribution).
In the intermediate cases-that is, r(n) ~ 00 and r(n)/n ~ 0 or n - r(n) ~ 00
and (n - r(n))/n ~ 0 as n ~ oo-one can either use the normal approximation
or an approximation by means of a sequence of extreme value distributions.
Thus, the problem of computing an estimate of the remainder term enters the
scene; sharp estimates will make the different approximations comparable.
In the case of maxima of normal r.v.'s we shall see that a certain sequence
of extreme value distributions provides a better approximation than the limit
distribution.
O. Introduction
Q+
L vi,n
i=l
where Q is the limiting distribution and the vi,n are signed measures depending
on the sample size n. A prominent example is provided by Edgeworth expansions. Usually, the signed measures have polynomials h,n as densities w.r.t. Q.
If Q has a density 9 then the expansion may be written
Q(B)
+ it
vi,n(B) =
L(1
+ ~ h,ix) )
g(x) dx
(8)
for every Borel set B. Specializing (8) to B = ( - 00, t], one gets approximations
to d.f.'s of order statistics.
The bound of the remainder term of an approximation will involve
(a) unknown universal constants, and
(b) some known terms which specify the dependence on the underlying d.f.
and the index of the order statistic.
Since the universal constants are not explicitly stated, our considerations
belong to the realm of asymptotics.
The bounds give a clear picture of the dependence on the remainder terms
from the underlying distribution. Much emphasis is laid on providing numerical examples to show that the asymptotic results are relevant for small and
moderate sample sizes.
X1np]:n
B} - P{(Y1 , , y,,)
B}I:= c5(F)
(9)
o.
Introduction
Alternatively, one may use densities, which playa key role in our methodology. As far as visual aspects are concerned the maximum deviation of
densities is more relevant than the L1 -distance (which is equivalent to the
variational distance of distributions).
The problem that discrete dJ.'s (like sample d.f.'s) have no densities can be
overcome by using smoothing techniques like histograms or kernel density
estimates. Thus the data points can be visualized by densities. The qJ. is
another useful diagnostic tool to study the tails of the distribution.
The graphical illustrations in the book were produced by means of the
interactive statistical software package ADO.
o.
Introduction
functionals of order statistics, the Bahadur statistic and the bootstrap method
are treated in Chapter 6. Certain aspects of asymptotic theory of order
statistics in the multivariate case are studied in Chapter 7.
Our own interests heavily influence the selection of statistical problems in
Part III, and we believe the topics are of sufficient importance to be generally
interesting.
In Chapter 8 we study the problem of estimating the qJ. and related
problems within the nonparametric framework. Comparisons of semiparametric models of actual distributions with extreme value and normal
models are made in Chapters 9 and 10. The applicability of these comparisons
is illustrated by several examples.
F- 1
IX(F)
w(F)
IB
x4,y
w.p.l
~n
defined on a probabil-
PART I
EXACT DISTRIBUTIONS
AND BASIC TOOLS
CHAPTER 1
After an introduction to the basic notation and elementary, important techniques which concern the distribution of order statistics we derive, in Section
1.3, the dJ. and density of a single order statistic. From this result and from
the well-known fact that the spacings of exponential r.v.'s are independent (the
proof is given in Section 1.6) we deduce the joint density of several order
statistics in Section 1.4.
In Sections 1.3 and 1.4 we shall always assume that the underlying dJ. is
absolutely continuous. Section 1.5 will provide extensions to continuous and
discontinuous drs.
In Section 1.6, the independence of spacings of exponential r.v.'s and the
independence of ratios of order statistics of uniform r.v.'s is treated in detail.
Furthermore, we study the well-known representation of order statistics of
uniform r.v.'s by means of exponential r.v.'s. This section includes extensions
from the case of uniform r.v.'s to that of generalized Pareto r.v.'s.
In Section 1. 7 various results are collected concerning functional parameters of order statistics-like moments, modes, and medians.
Finally, Section 1.8 provides a detailed study ofthe conditional distribution
of one collection of order statistics conditioned on another collection of order
statistics. This result which is related to the Markov property of order statistics
will be one of the basic tools in this book.
12
As special cases we obtain Zl:. = min and Z.:. = max. Such a representation
of order statistics is convenient when order statistics of different samples have
to be dealt with simultaneously. Then, given another sequence ~~, ... , ~~ of
r.v.'s, we can write X;:. = Z".(~~, ... , ~~).
13
n
Zr :::;;
itT
L 1(-oo,tl(x;) ~ r
;=1
(1.1.6)
with Zl :::;; . :::;; Zn denoting again the ordered values of Xl' . , X n From (1.1.6)
it is immediate that
(1.1.7)
and hence,
(1.1.8)
with
(1.1.9)
defining the sample dJ. Fn.
Given a sequence of independent and identically distributed (in short, i.i.d.)
r.v.'s, the dJ. of an order statistic can easily be derived from (1.1.8) by using
binomial probabilities. Keep in mind that (1.1.8) holds for every sequence
~1' ... , ~n ofr.v.'s.
Next, we turn to the basic relation between order statistics and the sample
quantile function (in short, sample qJ.) Fn- 1 For this purpose we introduce
the notion of the quantile function (in short, qJ.) of adJ. F. Define
F-1(q)
= inf{t: F(t)
~ q},
q E (0, 1).
(1.1.10)
Notice that the qJ. F- 1 is a real-valued function. One could also define
= inf{x: F(x) > O} and F-1(1):= w(F) = sup{x: F(x) < 1};
then, however, F- 1 is no longer real-valued in general.
In Section 1.2 we shall indicate the possibility of defining a qJ. without
referring to adJ.
F-1(q) is the smallest q-quantile of F, that is, if ~ is a r.v. with dJ. F then
F-1(q) is the smallest value t such that
F- 1(0):= a(F)
(1.1.11)
F-1(q) =
Z;
14
= 1, ... , n.
(1.1.12)
nq integer,
otherwise,
(1.1.13)
(1.1.13')
(1.2.1)
(1.2.2)
if F is continuous, and
where F- 1 is the qJ. of F.
15
Let V 1 : n ::;;::;; Vn:n and, respectively, X 1 : n ::;; ::;; Xn:n be the order
statistics of '11' ... , '1n and ~ 1, ... , ~n. Since an increasing order of the observations is not destroyed by a monotone (nondecreasing) transformation one
obtains
(1.2.3)
and
(X l:n' . .. , Xn:n)
(1.2.4)
Some Preliminaries
Let us begin by noting the simple fact that given ordered values Zl
get <P(Zl) ::;; ... ::;; <p(zn) if <p is nondecreasing [respectively, <P(Zl)
if <p is nonincreasing].
we
<p(zn)
::;; . ::;; Zn
...
Lemma 1.2.1. Let Xr:n be the rth order statistic of r.v.'s ~1' ... , ~n with range
R, <p a real-valued function with domain R, and X;:n the rth order statistic of
the r.v.'s <p( ~ 1), ... , <p( ~n)
Then,
(i) X;:n = <p(Xr:n)
(ii) X;:n = <p(Xn- r+1 : n)
if <p is nondecreasing,
if <p is nonincreasing.
= -Zn-r+l:n( -~1"'"
-~n)'
-Zn:n(-~l'"'' -~n)
(1.2.7)
indicates that results for the sample minimum can easily be deduced from
those for the sample maximum.
16
(1.2.8)
iff F-1(q)::; x
(1.2.9)
(1.2.10)
+ (1F- 1
0< q < 1,
(1.2.11)
if
<
F(x) < 1.
(1.2.12)
(1.2.13)
Obvious from (1.2.11) and the fact that every q E (0, 1) lies in the range
~~
17
iff F is continuous.
F(~) 4: '1
PROOF.
(ii) From (i) we know that ~ 4: F- 1 ('1). Thus, Criterion 1.2.3 implies that
F(~) 4: F(F- 1('1 = '1 if F is continuous. Conversely, for every x,
Pg = x}
~ P{F(~)
= F(x)} = 0
f II
gdF =
g(F- 1(xdx
(1.2.14)
~ F(F- 1('12}
{ '11
F(F- 1('11 < '12
= P{'11
<
:( '12}
where '11' '12 are independent (O,l)-uniformly distributed r.v.'s. Thus, the
probability
is independent of the continuous dJ. F.
We remark that the probability integral transformation in case of not
necessarily continuous dJ.'s will be given in Section 1.5.
Theorem 1.2.5. Let X 1:n, ... , Xn:n be the order statistics of n i.i.d. random
variables with common df. F. Then,
(i)
4: (X 1 : n, ... ,Xn:n),
18
and
(ii)
PROOF.
where (1' ... , (n are i.i.d. random variables with common dJ. F and tf1, ... , tfn
are ij.d. random variables with common uniform distribution on (0,1). Moreover, w.l.g. the r.v.'s tfi are (0, I)-valued. Since F- 1 is a nondecreasing function
it is immediate from Lemma 1.2.1 that
(X 1 :n' ... , Xn :n) 4, (Zl: n(F- 1(tf d, ... ,F- 1(tfn, ... , Zn:n(F- 1(tf d, ... , F- 1(tfn)
=
Corollary 1.2.6. Suppose that Xl:., ... , Xn:n are the order statistics of n i.i.d.
random variables with common continuous df F and X~:n' ... , X~:n are the
order statistics of n i.i.d. random variables with common df G. Then,
(1.2.15)
Since G- 1 is defined on (0,1) it may happen that the right-hand side of
(1.2.15) is only defined on a set with probability one. This, however, creates
no difficulties under the convention that the right-hand side is equal to
{F(Xi:n) E {O, I}} which has probability
some fixed constant on the set
zero.
Ur=l
Corollary 1.2.7. Let V"n and X"n be as in Theorem 1.2.5 (i). Then, for reals
t 1, ... , tk and integers 1 ::;; r1 < r2 < ... < rk ::;; n we obtain,
19
(ii) For every real-valued, nondecreasing and left continuous function G with
domain (0, 1) there exists a unique df. F such that G = F- l .
We remark that the dJ. can be regained from its q.f. by
F(x) = sup{q E (0, 1): F-l(q) :s; x}.
From Theorem 1.2.8 we know that it makes sense to say that a real-valued
function G with domain (0,1) is a q.f. if Gis nondecreasing and left continuous.
Since order statistics are more related to q.f.'s than to d.f.'s it is tempting
to formulate assumptions via conditions imposed on q.f.'s instead of dJ.'s.
However, we shall not follow this advice because of the dominant role of dJ.'s
in literature.
Fn-l(q)
-+
Fo-l(q),
n -+ 00,
20
To prove the converse conclusion repeat the argument above with (1.2.9)
and (1.2.12) replaced by Lemma A.1.3 and (1.2.11).
0
Let Fn denote again the sample dJ. According to the Glivenko-Cantelli
theorem, suP,IFn(t) - F(t)1 --+ 0, n --+ 00, w.p. 1. Thus one obtains as an immediate consequence of Lemma 1.2.9 that, w.p. 1, the sample qJ. Fn- l converges
to the underlying q.f. F- l at every continuity point of F- l .
e 1,
.. ,
PROOF.
~ t} =
(1.3.1)
u.
Lemma 1.3.1 proves once more the special case of k = 1 in Corollary 1.2.7.
It is obvious from (1.3.1) that
t} = F(t)n,
(1.3.2)
and
P{Xl:n
t}
= 1-
(1 - F(t))n.
(1.3.3)
Notice that (1.3.2) can easily be proved in a direct way since for i.i.d. random
variables e l' ... , en we have
P{Xn:n ~ t} = P{el ~ t, ... ,en ~ t} = F(t)n.
21
It is apparent that if l ' ... , en are independent and not necessarily identically distributed (in short, i.n.n.i.d.) r.v.'s then
P{Xn'n ::; t} =
n Fi(t)
n
(1.3.4)
i=l
Theorem 1.3.2. Let X"n be the rth order statistic of n i.i.d. random variables
with common df. F and density f Then, X"n has the density
y-1(1 - Ft- r
j,.,n = n!f (r _ 1)!(n - r)!
(1.3.5)
PROOF. From Lemma 1.3.1 we know that the dJ. of X"n, say, G can be written
as the composition G = H 0 F where the function H is defined by H(t) =
Li'=r(iW(1 - t)n-i. For every t where f(t) is the derivative of Fat t we know
that the derivative of Gat t exists and G'(t) = f(t)H'(F(t)); it suffices to prove
that G'(t) = j,.,n(t). The derivative of H is given by
t r - 1(1 - tt- r
H'(t) = n! ----,--,--(r - 1)!(n - r)!
(1.3.6)
and hence the assertion of the theorem holds. For proving (1.3.6) check that
H'(t) =
,=r
i (~) t H (1 - t)n-i I
- i=r+1
f i (~) t i- 1(1 I
t r - 1(1 - tt- r
= n! .,-----.,...,..-,-----'-----,-,
(r - 1)!(n - r)!
t)n-i
- t)n-i-1
22
iC) = (n - i +
l)C ~ 1) = n!/i -
1)!(n - i)!).
/",n = njFn-l
and
(1.3.7)
0< x < 1,
(1.3.8)
SJ
where b(r, s) = x r - 1(1 - X)s-l dx is the beta function. Recall that b(r, s) =
r(r)r(s)!r(r + s) where r is the gamma function [with r(r) = (r - I)! for
positive integers r].
The following example, concerning sample medians, gives a flavor of the
asymptotic treatment of central order statistics. It indicates that central order
statistics are asymptotically normal.
Let cp denote the standard normal density given by
cp(x)
Deduce from (1.3.8) that the density h m of the normalized sample median
2(2m)1/2(Um+1'2m+l - 1/2)
is given by
and = 0, otherwise, where c(m) is a constant. Since
(1 - x2/2m)m
--+
exp( - x 2/2),
m --+
00,
--+
--+ 00,
and hence
m
--+ 00,
(1.3.9)
23
for every x. The Scheffe lemma 3.3.2 yields that the distribution of the normalized sample median converges to the standard normal distribution W.r.t. the
variational distance as the sample size goes to infinity.
if
G () = {exp( -( -x)~)
2.~ x
1
if
G3 (x)
= exp( _e-
X )
x~O
"Frechet"
x> 0,
x~O
"Wei bull"
x> 0,
for every x.
(1.3.10)
"Gumbel"
where r:l > 0 is a shape parameter. We say that two dJ.'s G1 and G2 are of the
same type if Gdh + ax) = G2 (x) for some a > 0 and real h.
Frequently, it will be convenient to write G3.~ in place of G3 where r:l is
always understood to be equal to 1. The following identities show that the
dJ.'s Gi.~ are in fact limiting dJ.'s of sample maxima. We have
G~.~(nl/~x) = Gl,~(x),
= G2.~(X),
G~(x + log n) = G3 (x).
(1.3.11)
GL(n-l/~x)
nl/~,
dn = 0
en
n-l/~,
dn = 0
if i = 2
en
= 1,
dn = logn
=1
(1.3.13)
= 3.
Notice that G2 1 (x) = eX, x < 0, defines the "negative" standard exponential dJ. The dJ. G2 1 will usually be taken as a starting point of our investiga-
24
tions. This is partly due to the fact that G2 , 1 is the limiting dJ. of the maximum
Un : n of (0, I)-uniformly distributed r.v.'s. To prove this, notice that
P{n(Un:n - 1) ~ x}
(1
+ x/n)",
-n ~ x ~ 0,
and
(1.3.14)
(1
+ x/n)" ~ eX =
G2 ,l(X),
n~
00,
0.
F2 ,,,(x)
F3(X)
=1-
(1.3.15)
G3( -x).
where
Cn
+ dn))n = Fi,,,(x)
(1.3.16)
Gi,,,((x - f.l)/(J).
1 + px > O.
(1.3.17)
Moreover,
(1.3.18)
Since (1 + PX)llfJ ~ eX, P ~ 0, it is clear that HfJ(x) ~ Ho(x), P ~ O. The
Frechet and Weibull drs can be regained from HfJ by the identities
25
P>O
and
if
G2 ,-1/P(X) = Hp( -(x
+ l)IP)
(1.3.19)
P< 0.
-liP, P>
< -liP, P < 0,
if x >
and = 0, otherwise.
Figure 1.3.1 shows the standard Gumbel density h Q Notice that the mode
of the standard Gumbel density is equal to zero.
Figure 1.3.2 indicates the convergence of the rescaled Frechet densities to
the Gumbel density as P! 0. Figure 1.3.3 concerns the convergence of the
rescaled Weibull densities to the Gumbel density as Pi 0.
The illustrations indicate that extreme value densities-in their von Mises
parametrization-form a nice, smooth family of densities. Frechet densities
(recall that this is the case of P > in the von Mises parametrization) are
skewed to the right. This property is shared by the Gumbel density and
0.5
-3
26
-3
Figure 1.3.2. Gumbel density ho and Frechet densities hp (von Mises parametrization)
with parameters f3 = 0.3, 0.6, 0.9.
-3
Figure 1.3.3. Gumbel density ho and Weibull densities hp (von Mises parametrization)
with parameters f3 = -0.75, -0.5, -0.25.
Weibull densities for P == -1/IY. larger than -1/3.6. For parameters P close
to -1/3.6 (that is, IY. close to 3.6) the Weibull densities look symmetrical.
Finally, for parameters Psmaller than -1/3.6 the Weibull densities are skewed
to the left. For illustrations of Frechet and Weibull densities, with large
parameters IPI, we refer to Figures 5.1.1 and 5.1.2.
In Figure 1.3.4 we demonstrate that for certain location, scale and shape
parameters jJ., (J and IY. = -l/P it is difficult to distinguish visually the Weibull
density from a normal density. Those readers having good eyes will recognize
27
0.5
-4
Figure 1.304. Standard normal density and Weibull density (dotted line) with parameters J1 = 3.14, (J' = 3048, and rx = 3.6.
a difference at the tails of the densities (with the dotted line indicating the
Wei bull density).
28
TI f(xJ,
i=l
and = 0, otherwise.
Let Sn be the permutation group on {l, ... ,n}; thus, (r(l), ... ,r(n
is a permutation of (1, ... , n) for every r E Sn Define Br = {~r(l) < ~r(2) <
... < ~r(n)} for every r E Sn. Note that
PROOF.
and (~r(I)' ... , ~r(n) has the same distribution as (~I' ... , ~n)'
Moreover, since the r.v.'s ~i have a continuous dJ. we know that ~i and ~j
have no ties for i '# j (that is, Pg i = 0 = 0) so that P(I,rEsnBr) = 1.
Finally, notice that the sets Bp r E Sn, are mutually disjoint. Let Ao =
{(xI, ... ,xn): XI < X 2 < ... < x n}, and let A be any Borel set. We obtain
P{Xn
A} =
P( {Xn
P{(~r(I)""'~r(n)EAnAo}=n!P{(~I'''''~n)EAnAo}
A} n Br) =
tES"
reS"
P({(~r(l)'"'' ~r(n)
A} n Br)
tES n
... ,
~n
<
-i~ Xi}
Xl
and = 0, otherwise.
(ii) If ~ I ' ... , ~n are i.i.d. random variables with uniform distribution on (0, 1)
then
<
Xl
(1.4.3)
and = 0, otherwise.
Using Example 1.4.2(i) we shall prove that spacings Xr:n - X r - I :n of
exponential r.v.'s are independent (see Theorem 1.6.1). As an application one
obtains the following lemma which will be the decisive tool to establish the
joint density of several (in other words, sparse) order statistics X r, on' ... , X rk : n
Lemma 1.4.3. Let Xi:n be the ith order statistic of n i.i.d. standard exponential
r.v.'s. Then, for 1 :s:; rl < ... < rk :s:; n, the following two results hold:
29
PROOF. (i) follows from Theorem 1.6.1 since X l : n , X 2 : n -X l :n, ... , Xn:nl : n are independent.
(ii) From Theorem 1.6.1 we also know that (n - r + I)(Xr:n - X r- l : n) is
a standard exponential r.v. Hence, using an appropriate representation of
Xs:n - Xr:n by means of spacings we obtain for 0 ~ r < s ~ n,
X n-
_ X
X
son
Sf (n -
(r
+ i) + l)(Xr+i:n - X r +i-1:n)
i=l
ron
n - (r
.!!. s~ ((n - r) - i
-
+ i) + 1
+ l)(Xi :n- r +1
().
n - r - I
1...
i=l
Xi-l:n-r) _
- Xs-r:n-r
From Lemma 1.4.3 and Theorem 1.3.2 we shall deduce the density of
Xr,:n - X r'_l: n' and at the next step the joint density of
X r, on' X r2 :n - X r, on' ... , X rk :n - X rk _l : n
in the special case of exponential r.v.'s. Therefore, the joint density of order
statistics X r, on' ... , X rk :n of exponential r.v.'s can easily be established by
means of a simple application of the transformation theorem for densities.
(1.4.4)
l/det(8T/8x)
T- l
(1.4.5)
30
J.
1 2 . . . . . .k
:n(x) = n! (
TI f(xi) ,=1
TI
,=1
(._. _
if 0 < F(x 1) < F(X2) < ... < F(Xk) < 1, and =0, otherwise. [We use the convention that F(xo) = 0 and F(Xk+1) = 1.]
PROOF. (I) First assume that ~ 1, ... , ~n are standard exponential r.v.'s. Lemma
1.4.3 and Theorem 1.3.2 imply that the joint density g of
X. ,:n , X. 2 : n
X. ,:n ,
X. k : n
is given by
_ k [
TI
g(x) - ,=1
X. k _ l : n
't-,]
'
0, i = 1, ... , n,
and = 0, otherwise.
From (1.4.4) and Example 1.4.4 we get, writing in short
J., ......k:n' that for 0 = Xo < Xl < ... < Xk'
kn
instead of
TI
k
e-(n-r j +l)(xj-X i -
[l -
31
e-(Xi-Xi-dJ'j-ri-1-l
i=1
TI e-(n-rj +l)(xj
k
i=1
and ir,n = 0, otherwise. The proof for the exponential case is complete.
(II) For Xi,n as in part I we obtain, according to Theorem 1.2.5(ii) that
d
where G(x) = 1 - e-X, x ;::: O. Using this representation, the assertion in the
uniform case is immediate from part I and (1.4.4) applied to
B
= {x: 0 <
XI
xd
and
T(x)
gr,n(F(xd,,F(xk))dQk(xl,,Xk)
Xi~!
oo,t,]
--+ TI~=I
=f
=
f X7~1(-OO,F(ti)](F(xI)"'"
X7~1 (-ro,F(t i )]
where the 3rd identity follows by means ofthe probability integral transformation (Lemma 1.2.4(ii)). This lemma is applicable since F is continuous. The
proof is complete if
l(-ro,F(t)](F(x)) = l(_oo,tj(x)
This, however, is obvious from the fact that (- 00, t] c {y: F(y) ~ F(t)} and
that both sets have equal probability w.r.t. Q (prove this by applying the
probability integral transformation).
0
Remark 1.4.6. The condition 0 < F(x l ) < ... < F(x k) < 1 in Theorem 1.4.5
can be replaced by the condition XI < ... < x k To prove this notice that
32
{O <
F(~I)
F(~k)
< 1}
C gl
~dandshowthatbothsetshave
[I f(xJ
.-1
] (1 - F(Xk))"-k
( _ k)'
'
n.
(1.4.7)
lJ f(xJ
k
F(x 1 k
(n _ k)! '
(1.4.8)
33
Theorem 1.5.1. Let X,," be the rth order statistic of n i.i.d. random variables
with common continuous df F. Then, X,," has the F-density
pr-1 (1 _ F)"-r
n!----(r - l)!(n - r)!
(1.5.1)
P{X,,"
~ x} = J:oo H'(F)dF
with H' as in (1.3.6). According to (1.2.4), Criterion 1.2.3 and (1.2.9), the
right-hand side above is equal to Jt(X) H'(x) dx. Moreover,
fo
F(X)
H'(x)dx
= H(F(x)) = P{X,,"
~ x}.
Notice that Theorem 1.3.2 is immediate from Theorem 1.5.1 under the
condition that F is absolutely continuous.
... ,
~"
(1.5.2)
if
Xl
F(Xk+1)
= 1).
and
Note that Theorem 1.4.5 is immediate from Theorem 1.5.2 since Qk has the
Lebesgue density x --+ flf=r!(x;) if Q has the Lebesgue density f.
Remark 1.5.3. Part III of the proof of Theorem 1.4.5 shows that the following
result holds true: Let Qo be the uniform distribution on (0, 1) and let Q1 be a
probability measure with continuous dJ. F.
34
H(y, x) = F(x-)
Then,
H(I'/,~)
+ y(F(x) -
F(x-)).
(1.5.3)
PROOF. It suffices to prove that P{H(I'/,~) < q} = q for every q E (0, 1). From
(1.2.9) we know that ~ < F- 1 (q) implies F(O < q and ~ > F- 1 (q) implies
F(~) 2 q. Therefore, by setting x = F- 1 (q), we have
P{H(I'/,~)
< q} =
KQ =
is the uniform distribution on (0, 1).
f K( 'lx)dF(x)
35
fK -
00,
tJlx)dF(x)
P{F(x-)
= P{F(~-)
+ 11(F(x) -
+ 11(F(~) -
F(x-
F(~-
t}dF(x)
t} = t
+ y(F(x) -
F(x-.
Theorem 1.5.6. For 1 S k S nand 0 = ro < r1 < ... < rk < rk+l = n
Qk-density of (Xr1 :n'" ., X rk :n), say, f.. ...... rk: n is given by
J(O.l)k
+ 1 the
where grl ..... rk:n is the joint density of UrI on' ... , Urk :n
PROOF. The proof runs along the lines of part (III) in the proof of Theorem
1.4.5. Instead of Lemma 1.2.4(ii) apply its extension Lemma 1.5.4 to discontinuous d.f.'s. We have
=E[l Xf:l(-<Xl,F(ti)) (H(111 , ~1)"'" H(11k'~kgrl, ... ,rk:n(H(111' ~d, ,H(11k, ~k))]
= E[1 Xf=1 (-<Xl ,til (~1'"'' ~k)grl ..... rk:n(H(111' ~ 1),, H(11k' ~k))]
where 11 1, ~ l' ... , 11k' ~k are independent r. v.'s such that ~ 1, ... , ~n possess the
common dJ. F, and 111' ... , 11k are uniformly distributed on (0, 1). The second
identity is established in the same way as the corresponding step in the proof
of Theorem 1.4.5 by applying Lemma 1.5.4 instead of Lemma 1.2.4(ii). Now
the assertion is immediate by applying Fubini's theorem.
0
Notice that H(Y1, xd < H(Y2, X2) if and only if either Xl < X2, or Xl = X2
and Y1 < Y2. Hence, by using the lexicographical ordering one may write
Theorem 1.5.6 in a different way:
Corollary 1.5.7. Define Bk as the set of all vectors (X 1,Y1, ... ,Xk,Yk) with
1, i = 1, ... , k, and Xi < Xi+1 or Xi = Xi+l and Yi < Yi+1 for i = 1, ... ,
o < Yi <
k - 1.
Then, the density f.. ...... rk:n> given in Theorem 1.5.6, is of the following
form:
36
=,n . .k+1
n [H(y;,;x) (- . H(y
_.
r,
Bk .=1
;-1,
r,-l
;-1
)],;-,;_,-1
_ 1)'
Y1 ... Yk
where the summation runs over all subsets S of {I, ... , n} with m elements.
Moreover, xf,n ::;; ... ::;; X!:n are the order statistics of n i.i.d. random variables
with common df.
FS = ISI- 1
ieS
F;.
37
(i) the spacings X l :n, X2:n - X l :n, ... , Xn:n - X n- l :n are independent,
and
n exp( -x;)I(O,,,,)(X
i=l
n
X --+
i)
is a joint density of
nXl:n> (n - I)(X2:n - X l : n ),
(Xn:n - X n -
l : n )
From Example 1.4.2(i), where the density of the order statistic of exponential r.v.'s was established, the desired result is immediate by applying the
transformation theorem for densities to the map T = (Tl , . . , 7;,) defined by
T;(x) = (n - i + l)(xi - xi-d, i = 1, ... , n.
Notice that det(oT/ox) = n! and T-l(x) = (LJ=l x)(n - j + 1~=l' Moreover, use the fact that L~=l Xi = L~=1 LJ=1 x)(n - j + 1).
0
From Theorem 1.6.1 the following representation for order statistics Xr:n
of exponential r.v.'s is immediate:
(1.6.1)
Note that spacings of independent r.v.'s '11' ... , '1n with common dJ.
F(x) = 1 - exp[ -a(x - b)],x ~ b,arealsoindependent.Itiswellknown(see
e.g. Galambos (1987), Theorem 1.6.3) that these dJ.'s are the only continuous
dJ.'s so that spacings are independent.
r = 1, ... , n,
38
PROOF. Let X"n be as in Theorem 1.6.1 and let F be the standard exponential
dJ. Since U"n ~ F(X"n) we get
Since U"n/Ur+l:n, Ur+Ln are independent one could have the idea that
also U"n, U"n/Ur+1:n are independent which however is wrong. This becomes
obvious by noting that 0 ::'S: U"n ::'S: U"n/Ur+l,n ::'S: 1.
gn+1 (xn+1) =
(:a
1, and gn+1
+ it ((Xi -
(Xn+l )Xi) ]
= 0, otherwise.
39
PROOF.
T(B) =
= x:+1 (where
Xl
Xn+1
0
0
0
Xn+1 Xn
Xl
0
0
Xn+1
Xn
-X n+1 ... -Xn+1 (1 - '[.7=1 x;)
hn(xn)
if Xi >
n! (
n+l)[
ai an+1
}J
+ ;~ (a;
- an+l )x;
J-(n+1)
Lemma 1.6.4 will only be applied in the special case of i.i.d. random
variables. We specialize Lemma 1.6.4 to the case of a 1 = a2 = ... = an+l = 1.
Lemma 1.6.6. Let '11' ... , '1n+1 be i.i.d. standard exponential r.v.'s. Then,
'[.;:11
40
(iii) IJ l' IJ 1+ IJ 2,
... ,
ijO<x 1 <<x n ,
(i) and (ii) are obvious since the density gn+1 in Lemma 1.6.4 is of the
form
gn+1 (x n+1)
We prove that spacings of (0, I)-uniformly distributed r.v.'s have the same
joint distribution as the r. v.'s IJr/Ci.;:ll lJj ) above by comparing the densities
of the distributions.
Theorem 1.6.7. If IJ l'
... ,
( Ib
n+1
J-1
IJj
)n+1
r=l
. (1.6.2)
1Jt!(2:;:: IJj), ... , 1'fn+1/(2:;:11 1J) are also exchangeable. Thus, Theorem 1.6.7
yields that the distribution of (Ur:n - Ur-1:n)~;;t (where Un+1:n = 1) is invariant under the permutation of its components. This implies, in particular,
that all marginal distributions of (Ur:n - Ur- 1:n)~;;t of equal dimension are
equal.
Corollary 1.6.8. For every permutation r on {I, ... , n
+ I},
(1.6.3)
Let us also formulate Theorem 1.6.7 in terms of the order statistics Ur:n
themselves. Since Ur:n = 2:~=1 (Ui :n - Ui - 1:n) we obtain
41
Corollary 1.6.9. If '11' ... , '1n+l are i.i.d. standard exponential r.v.'s, then
(1.6.4)
Reformulation of Results
At a first step, the results above will be reformulated to order statistics Vi:n
of n i.i.d. random variables uniformly distributed on ( -1, 0). From Section 1.2
we know that
(1.6.5)
In this sequel, we shall deal with "negative" standard exponential r.v.'s
'1i in place of standard exponential r. v.'s '1i' Thus, ~ 1, ... , ~n+l are i.i.d.
random variables with common dJ. Gz , 1 (compare with (1.3.1 0)). We introduce
the partial sums
~i
=-
(1.6.6)
From Lemma 1.6.6(ii) it is obvious that Sk is a "negative" gamma r.v. with
parameter k having density x ...... e X ( - X)k-l I(k - 1)!, x < O. Corollary 1.6.9 is
equivalent to
Corollary 1.6.10.
(1.6.7)
Notice that -Sn+t!n ...... 1, n ...... 00, w.p. 1, which in conjunction with (1.6.7)
indicates that, for every fixed k, asymptotically in distribution,
(1.6.8)
Recall that for k = 1 such a relation was proved in (1.3.14). For further
details see Section 5.3.
Next, we reformulate Malmquist's result.
Corollary 1.6.11. We have
(i)
(ii)
(iii)
VZ:n _ V
1:n
Vl:n ,
v,,-r+l:n/v,,-r:n
=d -
Vl:r
Sl Sz
Sn-l Sn
-S ,-s ,"', - S ,-s ,Sn+l
n
n+l
z 3
for r
= 1, ... , n -
n+l
1,
(1.6.9)
42
PROOF. (i) is obvious from (1.6.7). (ii) is immediate from Corollary 1.6.2(ii).
Ad (iii): From Corollary 1.6.2(i) we know that the first n components of the
vectors in (1.6.9) are independent. Moreover, it is immediate from Lemma
1.6.6(i) that (S,/Sn+1 )~=l' Sn+l are independent and this property also holds for
(S,/S,+1 )~=l' Sn+1' Thus, (iii) holds.
D
= -( _x)l/a
E (-
00,0),
i = 1
if i
-log(-x)
(1.6.10)
i=3.
with
W;,a
1 + log Gi,a
whenever -1 < log Gi,a < 0. Thus, the class of generalized Pareto d.f.'s arises
out of W2 ,1 in the same way as the extreme value d.f.'s out of G2 ,1'
For rt > we have
W1,Ax)
W",(x)
W3(x)
_ox_ a
~ { I - ~- xl'
=
_oe_ x
if
x~1
x> 1,
x
if x
x
if
"Pareto"
~-1
(-1,0)
~ 0,
x ~ 0,
x> 0.
"Uniform etc."
"Exponential"
(1.6.11)
43
(1.6.14)
T2,a(a)/T2,a(b)
= - T2,a( - a/b).
Xn:n
X2:n
. d epen d entr.v.s,
'
- - , .. ,--,Xl:narem
Xn -
l :n
Xl :n
if
i
i
=1
=2
for r
= 1, ... , n -
1.
44
Exact Moments
Let U1 : n, ... , Un:nagain denote the order statistics of n i.i.d. random variables
with common uniform distribution on (0, 1). The first result is a nice application of Malmquist's lemma (see Corollary 1.6.3).
Lemma 1.7.1. Let 0 < r1 < ... < rk < rk+1 = n + 1, and let m 1, ... , mk be integers such that ri +
mj ~
for i =
k. Then,
IJ=l
( iI
k
k
ED
Ur7:n = Db
ri +
,=1
,=1
J=l
+s-
1, ... ,
mj , ri+1
- ri
)/ b(ri' ri+1 - rJ
(1.7.1)
IJ=l
i=l
i=l
= TI
k
i=l
E Us,
ri :rj +1 -1
45
EUr:n = rl(n
+ 1) = Jlr,n'
(1. 7.3)
EUr ' n
.
_
c-
E[(U
ron
and, for r
E[(U
ron
Jlr,n
(1.7.4)
s,
)(U _
)] = Jlr,n(1 - Jls,n)
son Jls,n
n+2
(1.7.5)
t,
Jlr,n
)(U _
)(U _
)]
son Jls,n I:n JlI,n
= 2Jlr,n(1
(n
- 2Jls,n)(1 - JlI,n)
+ 2)(n + 3)
.
(1.7.6)
Next we state the expectation and the variance of the rth order statistic
Xr:n of i.i.d. standard exponential r.v.'s. From Theorem 1.6.1 we know that
Xr:n 4: L~=1 '1J(n - i + 1) where '11' ... , '1r are standard exponential r.v.'s
(thus, having common expectation and variance equal to 1). This implies
immediately that
EXr:n =
L (n - i + 1)-1
i=1
=: Jlr,n
(1.7.7)
and
r
E(Xr:n - Jlr,n)2 =
L (n - i + 1)-2.
(1.7.8)
i=1
46
PROOF. Let F be the dJ. of ~ l ' Put C = n!/n~:; (ri - ri- 1 - I)!, B =
{(x 1 , ,Xk):0<X 1 <<xk <I}, xo=O and Xk+l=1. From Theorem
1.2.5(i) and Theorem 1.4.5 we get
= C
(Xi -
i=1
:$; C
k+1
f
JeO.ll
xi_d'i-'i-I -1 dx 1 dX k
where the final identity becomes obvious by using the quantile transformation.
D
For g(x)
(1.7.9)
Next, we find some necessary and sufficient conditions which ensure that
moments of central order statistics exist and are finite if the sample size n is
sufficiently large.
Lemma 1.7.3. Let X i:j be the ith order statistic of j i.i.d. random variables
1, ... , j with common df. F. Assume that
(1.7.10)
for some positive integers j, m and s
C > 0 such that
X E
(0, 1).
(1.7.11)
(1.7.12)
00
.,
= (s -1~(j _
fl
(l/x)dx
(1/(1 - x))dx
00.
47
n'
n!
(r - 1)!(n - r)!
and r - 1 - ks/m
JoeW- (x)jkx r1
ck/m (1 Xr-l-ks/m(1
Jo
(1 - xrr dx
0 as well as n - r - (j - s
00,
+ l)k/m ~ O.
Corollary 1.7.4. For every positive integer k and 0 < at: < 1/2 the following three
conditions are equivalent:
(i)
;5;
r ;5; (1 - at:)n.
W- 1 (qWq(1 - q) <
00.
(1.7.13)
00.
(1.7.14)
sup
qE(O,l)
PROOF. If (i) holds for all n ~ no, say, then the implication (1.7.10) ~ (1.7.11)
yields (ii) with b = kj(noat: + 2).
Moreover, if (ii) holds then (1.7.11) ~ (1.7.12) yields (i) for no =
[(1 + k(1 + l/b))/at:]' Thus (i) and (ii) are equivalent.
To prove the equivalence of (ii) and (iii) notice that (1.7.13) holds iff there
exists b > 0 such that
(a)
W- 1 (qWq <
1 and (b)
W- 1 (qW(l - q) ;5;
(1.7.13')
(1.7.14')
48
f'
gr,n = P
+ 1)-1 P pr-1(1
r-l
+ ----p- -
n-r
1_ F
(1.7.16)
Lemma 1.7.5. The density fr:n of Xr:. is unimodal if, and only
u such that gr,.II(u) ~ 0 and gr,.IJ(u) :5: O.
PROOF. Immediate from (1.7.15) and (1.7.16). Define u:= sup{x: IX(F) < x <
w(F) and gr,.(x) ~ O} if {x: IX(F) < x < w(F), gr,n(x) ~ O} # 0, and u =
inf{x: IX(F) < x < w(F), gr,n(x) < O}, otherwise.
0
f(x) =
{I!
if
-! < x < 0
O:5:x<1
49
Corollary 1.7.6. (i) If 1'/ is non increasing on (ex (F), w(F)) then j,.,n is unimodal.
(ii) If, in addition, gr,n(u) = 0 for some U E (ex (F), w(F)) and n ~ 2 then u is the
Medians
As a third functional parameter of order statistics we consider the median of
the distribution of an order statistic. Again we are interested in the relationship
between the underlying distribution and the distributions of order statistics.
Recall that a median u of a r.v. ~ is defined by the property that
(1.7.17)
(1.7.17) holds if F(u) = t. Moreover, if the dJ. F of
(1.7.17) is equivalent to the condition F(u) = t.
is continuous, then
Lemma 1.7.8. Let X;,2m+1 be the ith order statistic of i.i.d. random variables
~ l ' ... , ~ 2m+1 with common df F where m is a positive integer. Then, every
median of ~1 is a median of X m+1,2m+1'
PROOF.
that
P{Xm+Um+1 ::;
u}
= P{Um+1,2m+1 ::;
F(u)} ~
t}.
50
::;
t} -#
Lemma 1.7.10. Let Xi:n denote the ith order statistic of n i.i.d. random
variables eI, ... , en with continuous df F. Then every median of e1 is a median
of M r n
51
= t[P{Ur:n S
= t[P{Ur:n S
2'
Introductionary Remarks
At the beginning let us touch on some essential definitions and properties
concerning the conditional distribution
P(Y E 'IX)
of Y given X.
In the present context it is always possible to factorize the conditional distribution P( Y E '1 X) by means of the conditional distribution
P(YE 'IX = x) of Y given X = x. Moreover, P(YE BIX) is the composition
of P(YEBIX =.) and X. By writing, in short, P(YEBI') in place of
P(YE BIX = .) we have P(YE BIX) = P(YE BI') 0 X.
Apart from a measurability condition and the fact that P(Y E 'IX = x) is a
probability measure the defining property of P(Y E 'IX) is
E(lA(X)P(YE BIX))
= P{X E
A, YE B}
(1.8.1)
(1.8.2)
52
Lj~('IX)dIl2'
'IX = x)
of (X, Y)
given X = x
A, (X, Y)
BI
B2 }.
(1.8.3)
+ 1.
Theorem 1.S.1. Let F be a continuous dj., and let 0 = ro < r l < ... < rk <
rk+1 = n + 1. If IX(F) = xro < Xr, < ... < x rk < Xrk + 1 = w(F) then the conditional distribution of (Xl:n, ... ,Xn:n) given (Xr,:n"",Xrk:n) = (xr" ... ,xr.) is
the joint distribution of the r.v.'s YI , ... , y" which are characterized by the
following three properties:
+ 1, rj -
53
is the order statistic of ri - ri-1 - 1 U.d. random variables with common d.f.
Fi,x'
(b) Y,., is a degenerate r.v. with fixed value Xr, for i = 1, ... , k.
(c) W;, i E I, are independent.
PROOF. Put M := {I, ... , n} \ {r1'"'' rk}' In view of (1.8.3) it suffices to show
that the conditional distribution of the order statistics X i : n, i E M, given
X =: (Xr,:n,""Xrk :n) = (xr" ... ,xrJ =: x is equal to the joint distribution of
the r.v.'s ~,j E M. This will be verified by constructing the conditional density
in the way as described above.
Denote by Q the probability measure corresponding to the dJ. F. Let f be
the Qn-density of the order statistic (X 1 :n"'" Xn:n) and 9 the Qk-density of X
(as computed in Theorem 1.5.2). Then, the conditional Qn-k-density, say,f( '1 x)
of X i : n, i E M, given X = x has the representation
if g(x) > 0 where z denotes the vector (Xi)ieM' Notice that the condition
g(x) > 0 is equivalent to oc(F) < x r, < ... < x rk < w(F). Check thatf(zlx) may
be written
f(zlx) =
ieI
F(X r'_1))r,-r,_,-1
54
A, g(Y)
C}.
(1.8.5)
(x s "
Corollary 1.8.2. Let 1 :s;; S1 < ... < Sm :s;; n. The conditional distribution of
(Xs, :n"'" Xsrn: n) given (Xrl :n.. X rk :n) = (x r,,. xrJ is the joint distribution
of the r.v.'s Y." ... Y.rn with 1'; defined as in Theorem 1.8.1.
As an illustration to Theorem 1.8.1 and Corollary 1.8.2 we note several
special cases.
EXAMPLES 1.8.3. (i) The conditional distribution of Xs:n given Xr:n = x is the
distribution of
(a) the (s - r)th order statistic Y.-r:n-r of n - r i.i.d. random variables with
dJ. F(x,oo) (the truncation of F of the left of x) if 1 :s;; r < s :s;; n,
(b) the (r - s)th order statistic y"-s:n-s of n - s i.i.d. random variables with
dJ. F(-oo,x) (the truncation of F on the right of x) if 1 :s;; s < r :s;; n,
(c) a degenerate LV. with fixed value x if r = s.
(ii) More generally. if in (i) Xs:n is replaced by
(a) X s:n r < s :s;; n. then in (i)(a) Y.-r:n-r has to be replaced by (Yl :n-,,""
~-r:n-r)'
(b) X s:n 1 :s;; s < r, then in (i)(a) y"-s:n-s has to be replaced by (Yl :n-S" .. ,
Y,,-s:n-.)
(iii) The conditional distribution of X r+1 :n' ... , X s- l :n given Xr:n = x and
Xs:n = Y is the distribution of the order statistic (Yl :s-r+1,'''' Y.-r+l :s-r+l)
of s - r + 1 i.i.d. random variables with dJ, F(x,y) (the truncation of F on the
left of x and on the right of y).
(iv) (Markov property) The conditional distribution of Xs:n given Xl:n =
Xl' ... , X s- l :n = X s-- 1 is the conditional distribution of Xs:n given X s- l :n =
xs - l . Hence, the sequence Xl :n' ... , Xn:n has the Markov property.
55
PROOF. Let Sk be the permutation group on {I, ... , k}. For every permutation
r E Sk we get the representation
(( 1", .,
and (R 1, ... , Rn) is the rank statistic (see P.1.30). Check that P(A t ) = 11k! for
every r E Sk' Using the fact that the order statistic and the rank statistic are
independent we obtain for every Borel set B
P(((1""'(k)
=
BIXn- k:n = x)
P(At n {(Xn -
(l)+1
B} IXn- k:n = x)
tSk
= (11k!)
P{(~(1):b'''' ~(k):k)
tE Sk
(11k!)
B}
tESk
where the Y;:k are the order statistics ofthe r.v.'s I1j. The last step follows from
Example 1.8.3(ii). By P.1.30,
P(((l'''''(k)
BIXn- k:n = x)
56
Corollary 1.8.5. Let F be a continuous df, and let 1 :::; r < s :::; n.
+ pXs:n:::; t}
= Fr,n(t) -
f",
where Fr,n is the df of Xr:n, and Y.-r:n-r is the (s - r)th order statistic of n - r
U.d. random variables with common df F(x,,,,) [the truncation of F on the left
of xl
This identity shows that it is possible to get an approximation to the dJ. of
the convex combination of two order statistics by using approximations to
distributions of single order statistics.
In Section 6.2 we shall study the special case of the convex combination of
consecutive order statistics Xr:n and X r+ 1 : n where Xr:n is a central order
statistic and, thus, Y.-r:n-r is a sample minimum.
PROOF OF COROLLARY
P{(l - p)Xr:n
+ pXs:n:::; t} =
=
P{(l - p)x
f",
+ pY.-r:n-r:::; t}dFr,n(x)
00
then X,(n):.
U,
L J(Xt(l):"""
'teSn
Xt(n):n) =
L J(~t(I)"'"
'reS"
~t(.)
(that is, the order statistic is invariant w.r.t the permutation of the given r.v.'s).
57
+ F(z)J.
5. Let I] be a (0, 1)-valued r.v. with dJ. F. Then, G- 1(I]) has the dJ. FoG for every
dJ. G.
6. Let I] be a r.v. with uniform distribution on the interval (U 1 ,U 2 ) where 0 ~ U 1 <
U 2 ~ 1. Let F be a dJ. and put Vi = F- 1(U i ) [with the convention that F- 1(0) = tx(F)
and p-1(1) = w(F)]. Then, p-1(I]) has the dJ.
G(x) = (F(x) - F(v 1))/(F(v2 )
~ G(x) for
F(vd),
every x
~ U
then P-1(q)
~ G- 1(q)
for every
8. Let ei, i = 1,2, 3, ... be r.v.'s which weakly converge to eo. Then, there exist r.v.'s
e; such that ei ~ e; and e;, i = 1, 2, 3, ... converge pointwise to e~ w.p. 1. [Hint:
Use Lemma 1.2.9.]
9. For the beta dJ. I, . with parameters rand s [compare with (1.3.8)] the following
recurrence relation holds:
(r
+ s)/". =
rl'+1,.
+ 1".+1'
Let Xi,. be the ith order statistic of n i.i.d. random variables with common dJ. F.
(i) If 1 ~ r < s ~ n then for u < V,
P{X". ~ u, X.,. ~ v}
=
.-i
i=, j=max(O,.-i)
..
I!}!(n -
1 -
F(u))i(l - F(V))-i- j
and for u ~ v,
P{X".
u, X.,.
v}
P{X.,.
v}.
[Hint: Use the fact that L;:=l [1(-oo,u)(ek), 1(u,v)(ed, 1(v,oo)(ek)] is a multinomial
random vector.]
(ii) Denote again by I". the beta dJ. Then for u < v,
P{X".
u, X.,.
v}
(r - 1)!
(Wilks, 1962)
11. (Transformation theorem)
Let v be a finite signed measure with density f Let T be a strictly monotone,
real-valued function defined on an open interval J. Assume that 1= T(J) is an
open interval and that the inverse S: I -+ J of T is absolutely continuous. Then
IS' I(f 0 S) 1[ is a density of Tv (the measure induced by v and T).
[Hint: Apply Hewitt & Stromberg, 1975, Corollary (20.5).]
58
12. Derive Theorem 1.3.2 from Theorem 1.4.1 by computing the density of the rth
marginal distribution in the usual way by integration.
(Hajek & Sidak, 1967, pages 39, 78)
13. Extension to Theorem 1.4.1: Suppose that the random vector (~l" .. , ~n) has the
(Lebesgue) density g. Then, the order statistic (Xl ,n"", Xn,n) has the density
fl. .... n'n given by
fl ..... n'.(x)
reS"
g(XT(l)'"'XT(.)'
XI
B}.
15. If the continuity condition in P.1.14 is omitted then the result remains to hold if
the sets Bj are open.
16. (Modifications of Malmquist's result)
Let 1 s rl < ... < rk S n.
(i) Prove that the following r.v.'s are independent:
1 - Ur"., (1 - Ur2 ,n)/(1 - Ur".), ... , (1 - Urk ,.)/(l - Urk-l'.)'
Moreover,
(1 - Uri ,n)/(l - Uri _1 , . )
= U.- ri +! ,.-ri_'
Moreover,
for i = 1, ... , k (with rk+1 = n + 1 and U.+ I ,. = 1).
(iii) Prove that the following r.v.'s are independent:
Ur"n,(Ur2 ,n - Ur".)/(1 - Ur".), ",,(Urk,n - Urk _, ,.)/(1 - Urk_"n)'
Moreover,
Moreover,
for i = 1, ... , k (with
rk+1
+ 1 and
Un+! ,.
= 0).
59
17. Denote by ~i independent standard normal r.v.'s. It is well known that (~i
is a standard exponential r.v. Prove that
(VI ,n"'" Vn,n) =d
.~ ~f )/(2(n+1)
.~ ~f ))n
( ( 2r
1-1
1-1
r=l
+ ~n/2
18. Let ~1' ... , ~k+l be independent gamma r.v.'s with parameters SI, ... , Sk+l'
(i) Then, (~JL.J:';t ~)~~1 has a k-variate Dirichlet distribution with parameter
vector (SI,"" Sk+1)'
(Wilks, 1962)
(ii) Show that for 0 = ro < r l < ... < rk < rk+l = n + 1,
19. Let Fn denote the sample d.f. of n i.i.d. (0, I)-uniformly distributed r.v.'s, and
rIJ, ... , '1n+1 independent standard exponential r.v.'s. Then,
Fn(t)
20.
(i) Let Xi,n denote the ith order statistic ofn i.i.d. random variables with common
density f As an extension of Theorem 1.6.1 one obtains that (X"n - Xr~Ln)~~l
has the density
x
--->
n!
(fu(t
1-1
)-1
xj )),
Xj
> 0, i = 1, ... , n,
--->
n!
L.
Xj
< 1,
j~1
--->
if x, y > 0 and x
+y<
1,
EVr~~ =
fl
m=l
(n - m + 1)/(r - m)
if 1 ~j < r.
60
24. Put Ar
r/(n
+ 1), Un+! ,n =
1 and Uo,n
O. Prove that
(i)
k+1
1, ... , k,
a.(U - A )2 - a,_ (U
- A )2
I
r"n
r,
I I
r'_I,n
r'_1 = 0
j=l
U'j:n-U'i_l:n
where ao = ak+1 = O.
26. Let X"n be the rth order statistic of n i.i.d. random variables with common dJ.
F(x) = 1 - 1/logx for x ~ e. Then, for every positive integer k,
EIX"nl k =
00.
27. For the order statistics XLI and X I ,2 from the Pareto dJ. Wl.l we get
EX", =
00
and
EX',2 = 2.
28. Let Mr,n be the randomized sample median as defined in (1.7.19) and
Nr.n = X"n 1(,/2.1)(tJ)
+ X n- r+, ,n 1(0.'/2j(tJ)
(~"
... , ~n)'
AI(X',n,,,,,Xn,n = (n!f'
rESn
~n)
= (n!f'
l A (X,(lp'''''X,(n),n)'
rES n
f(X,(l),n'''',X,(n),n)'
Bibliographical Notes
61
(iii) If, in addition, ~ I' ... , ~n are i.i.d. random variables then Rn and Xn are
independent and P{Rn = K} = lin! for every K E Sn'
(Hajek & Sidak, 1967, pages 36-38)
31. (Positive dependence of order statistics)
Let Xi,n denote the ith order statistic of n i.i.d. random variables with common
continuous dJ. F. Assume that EIXi,nl < 00, EIXj,nl < 00 and EIXi,nXj,nl < 00.
Then, Cov(Xi,n, Xj,n) ~ O.
(Proved by P. Bickel (1967) under stronger conditions.)
32. (Conditional independence under Markov property)
Let Yt , , y" be real-valued r.v.'s which possess the Markov property. Let
1 :::;; r l < ... < rk :::;; n. Then, conditioned on y"" ... , y"k' the random vectors
(YI , .. , Y,.,), (Y,.,+1'"'' Y,.,), ... , (y"k+1,"" y") are independent; that is, the product
measure
P((YI , .. , Y,.,)E IY,.,)
Y,.,)) x ..
I -
1)
n, and
Bibliographical Notes
Ordering of observations according to their magnitude and identifying central
or extreme events belongs to the most simple human activities. Thus, one can
give early reference to the subject of order statistics by quotations from any
number of ancient books. For example, J. Tiago de Oliveira gives reference
62
F- 1 (1 - 1/n)1
:$;
e}
1,
n~
00,
A similar result was also deduced by Dodd for various classes of distributions.
This development was culminated in the article of R.A. Fisher and L.H.C.
Tippett (1928), who derived the three types of extreme value distributions and
Bibliographical Notes
63
discussed the stability problem. The limiting dJ. Gl.~ was independently
discovered by M. Frechet (1927). As mentioned by Wilks (1948), Frechet's
result and that of Fisher and Tippett actually appeared almost simultaneously
in 1928.
We mention some of the early results obtained for central order statistics:
In 1902, K. Pearson derived the expectation of a spacing under a continuous
dJ. (Galton difference problem) and, in 1920, investigated the performance of
"systematic statistics" as estimators of the median by computing asymptotic
expectations and covariances of sample quantiles. Craig (1932) established
densities of sample quantiles in special cases. Thompson (1936) treated
confidence intervals for the q-quantile. Compared to the development in
extreme value theory the results concerning central order statistics were
obtained more sporadically than systematically.
It is clear that the considerations in this book concerning exact distributions of order statistics are not exhaustive. For example, it is worthwhile
studying distributions of order statistics in the discrete case as it was done
by Nagaraja (1982, 1986), Arnold et al. (1984), and Riischendorf (1985a).
B.C. Arnold and his co-authors showed that order statistics of a sample of size
n ~ 3 possess the Markov property if, and only if, there does not exist an atom
x of the underlying dJ. F such that 0 < F(x-) and F(x) < 1. In that paper one
may also find expressions for the density of order statistics in the discrete case.
We also note that densities of order statistics in case of a random sample size
are given in an explicit form by Consul (1984); see also Smith (1984, pages 631,
632). Further results concerning exact distributions of order statistics may be
found in the books mentioned below.
Apart from the books of E.J. Gumbel (1958), L. de Haan (1970), H.A. David
(1981), J. Galambos (1987), M.R. Leadbetter et al. (1983), and S.1. Resnick
(1987), mentioned in the various sections, we refer to the books of Johnson
and Kotz (1970, 1972) (order statistics for special distributions), Barnett and
Lewis (1978) (outliers), and R.R. Kinnison (1985) (applied aspects of extreme
value theory). The reading of survey articles about order statistics written by
S.S. Wilks (1948), A. Renyi (1953), and J. Galambos (1984) can be highly
recommended. For an elementary, enjoyable introduction to classical results
of extreme value theory we refer to de Haan (1976).
CHAPTER 2
2.1. Introduction
Multivariate order statistics (including extremes) will be defined by taking
order statistics componentwise (in other words, we consider marginal ordering).
It is by no means self-evident to define order statistics and extremes in this
particular way and we do not deny that other definitions of multivariate order
statistics are perhaps of equal importance. Some other possibilities will be
indicated at the end of this section. One reason why our emphasis is laid on
this particular definition is that it favorably fits to our present program and
purposes.
In this sequel, the relations and arithmetic operations are always taken
componentwise. Given x = (Xl'" .,Xd) and y = (Yl'" ',Yd) we write
x ::;; y
if
Xi::;;
Yi,
i = 1, ... , d,
(2.1.1)
and
(2.1.2)
2.1. Introduction
65
Z"n
(2.1.4)
Zr:n(~I.j'~2,j""'~n.j)'
We also write
(2.1.5)
(2.1.7)
= (X~~~, X~7~,
. .. , X~~~)
(2.1.8)
.. ,
~n
in general.
{~ (1(-oo,ttl(~i,I)'"'' 1(-oo,tdl(~i,d)) ~ r}
(2.1.9)
(2.1.10)
Thus, again the joint distribution of the r.v.'s X~{~, (j, r) E I, can be
represented by means of the distribution of a sum of independent random
vectors if the random vectors ~l' ... , ~n are independent. Note that a similar
result holds if maxima
(l)
X n(l):n(l)'
.. ,
X(d)
n(d):n(d)
66
L
;=1
(2.1.11)
f Ily -
Xll2
dQ(y)
= m!n!.
(2.1.12)
Totalljl-Ordering
Last but not least, we mention the ordering of multivariate data according to
the ranking method everyone is familiar with in his daily life. The importance
of this concept is apparent.
Following Plackett (1976) we introduce a total order of the points Xl' ... ,
Xn by means of a real-valued function 1/1. Define
(2.1.13)
if
2.1. Introduction
67
I/I(X)
(2.1.14)
I/I(y).
Usually one is not only interested in the ranking of the data Xl' ... , Xn
expressed in numbers 1, ... , n but also in the total information contained in
Xl' ... , Xn, thus getting the representation ofthe original data by
(2.1.15)
One advantage of this type of ordering compared to the marginal ordering
is that xi : n is a point of the original sample. It is clear that the ordering
(2.1.15) heavily depends on the selection procedure represented by the function 1/1.
As an example, consider the function I/I(x) = IIx - x o11 2. Other reasonable
functions 1/1 may be found in Barnett (1976) and Plackett (1976). Given the
random vectors ~ l' ... , ~n let
(2.1.16)
denote the I/I-order statistics defined according to (2.1.15) with I/I(x) =
IIx - x oll 2. Define
(2.1.17)
which is the distance of the kth largest I/I-order statistic from the center Xo'
Obviously,
(2.1.18)
is the kth largest order statistic of the n i.i.d. univariate r.v.'s
II~n - xol12 with common dJ.
lI~l
- Xo 112' ... ,
(2.1.19)
Here
B(xo,r)
~ r}
E B(xo,r)}
(2.1.20)
= {x:
(2.1.21)
68
The author is grateful to Peter Hall for communicating a 3-line sketch of the
proof of this result. An extension can be found in P.2.1.
If F(x o, .) is continuous then we deduce from Theorem 1.5.1 that for the
",-maximum Xn:n the following identities hold:
P{Xn:n
B} =
P(Xn:n
= n(n -
1)
BIRn-l:n)dP
f Pg
l E
B n C(x o, . )}F(x o, .
(2.1.22)
2 dF(x o,
').
~o.
= P{~l
69
>
+ F(x,y)
= 1 - (1 -
Fl (x)) - (1 - F 2 (y))
+ L(x, y).
= 1 - (1 -
Fl(x))n - (1 - F 2 (y))n
+ L(x,y)n.
(2.2.3)
nF
n-l
+ n(n -
l)F
n-2 of
of
ox oy
(2.2.4)
+ n(n -
1)Ln -
oL oL
2__
ox oy
(2.2.5)
R2 = (-oo,x] x (y,oo),
R4 = (x, (0) x (y,oo)
70
(X,y)
Put
Notice that L4 is the bivariate survivor function as mentioned above. We
have
and hence
Denote by
+ F(x, y).
1 - F1 (x) - F2 (y)
~i
in Rj ; thus,
n
~=
.L lR/~;).
,=1
= P
{~ l(-oo.x]((i,l) 2 r, i~ l(-oo,y]((i,2) 2
= P{N1 + N2 2 r, N1 + N3 2
=
L L P{N1
k=r I=s
S,
S,
N1
= m}
N1 = m}
= m, N2 = k - m, N3 = /- m}.
N 4 ) we get
min(k
= " ",,'
L... I=s
L...
k=r
I)
L...
m=max(k+l-n,O)
n'LmLk-mLI-mLn-k-l+m
)'(1 - m)'(
m.'(k - m.
. n - k - 1 + m),'
.
71
P{(X~~~,X!~~)EB}:S;
L P{(;,l,j,z)EB}
i,j=l
;,t1
Lf1(X)fz(Y)dXdY +
i#j
it
Lf(X,y) dx dy
X1 +h iXk+h
...
g(z)dz = g(x)
(2.2.6)
Xk- h
Xl-h
for (Lebesgue) almost all x (see e.g. Floret (1981), page 276).
The following lemma was established in cooperation with W. Kohne.
Lemma 2.2.2. If the bivariate i.i.d. random vectors ~1' ~z, ... , ~n have the
common density f then the random vector (X~~~, X!~~) has the density
r
J(r,s):n
= n.I
'\'
L...
m=O
Lm
_1
I
m.
[Lr-1-m L s- 1- mL n-r-s+m+1 rt
2
JI
72
Ls(x,y) =
f:",
L 7(x,y) =
Ix'" f(u,y)du,
Notice that
L 6 (x,y) =
f(u,y)du,
1'"
f(x, v)dv,
Ls(x,y) = f",f(X,V)dV.
Ss" .
Sl
Put
S6
...........
: (x,y)
S4
... S7 }2h .
Ss
S3
f(u,v)dudv for
Sj
~j ~ 8. Ob-
for j = 5, ... ,8. First, observe that for all (x,y) such that (2) holds we have
h-ZP{No ;:::: 2} -+0,
h-ZP{No = 1,
Js ~;::::
1}-+0,
and
as h -+
as h -+
almost everywhere.
jt ~
<
2} ]-+
(3)
1("s):n
73
+ N2 + Ns < r ~ Nl + N2 + Ns + No + N6 + N s ,
Nl + N3 + Ns < S ~ Nl + N3 + Ns + No + Ns + N7}
Thus, for m
= 0, ... , n,
{(X~~~,X~~~)
So, No
= {Nl + N2 <
1, Ns
r ~ Nl
= N6 = N7 =
Ns
= 0, Nl = m}
+ N2 + 1, Nl + N3 < S ~ Nl + N3 + 1,
No = 1, Ns = N6 = N7 = Ns = 0, Nl = m}
= {No
(4)
1, Nl
= m, N2 = r
- 1 - m, N3
=S-
1 - m,
= N6 = N7 = Ns = O}.
Ns
= 0, ... , n,
(5)
.f ~ ~ 2, Nl = m}
{(X~~~, X~~~) E So, No = 0, )=s
= {No = 0, Nl + N2 + Ns = r
N6
= {No = 0, Nl = m, N2 =
+ N3 + Ns = S - 1,
= 1, Ns + N7 = 1, Nl = m}
- 1, Nl
+ Ns
r - 2 - m, N3
= S - 1 - m,
N7 = 0, Ns = 0, Ns = 1, N6 = 1}
+ {No = 0, Nl = m, N2 = r
- 2 - m, N3
=S-
(6)
2 - m,
N6 = 0, N7 = 0, Ns = 1, Ns = 1}
+ {No = 0, Nl = m, N2 = r
- 1 - m, N3
Ns
+ {No = 0, Nl = m, N2 = r
= 0, Ns = 0, N6 =
- 1 - m, N3
Ns
= S - 1 - m,
=S-
1, N7
1}
1}.
2 - m,
= 0, N6 = 0, N7 =
1, Ns
Now (3) is immediate from (2), (5), and (6). The proof is complete.
In the special case of the sample maximum (that is, r = nand s = n) we have
1(n,n):n
= nFn-lj + n(n -
1)F"-2 LsLs
(2.2.7)
74
f:",/(U,Y)dU
(JF/8y)(x,y),
and
L 8(x,y) =
fco/(X, V) dv = (8F/8x)(x,y).
Let ~ = (~1' ~z) again be a random vector with dJ. F and density f. Let
11 (x) = SI(x, v) dv and Iz(Y) = SI(u, y)du be the marginal densities, and let
F1(xIY)
= P(~l
Fz(Ylx)
and
Pg z :s; YI~l
x) = L8(X,Y)/11(X)
.f(n,nj,n(x, y)
= nF"-l(x,y)/(x,y)
+ n(n -
1)F"-Z(x,y)F1(xly)Fz (ylx)/1(X)/z(Y).
(2.2.9)
for t > 0 where the reals bn and an > 0 are appropriate normalizing constants.
In order to calculate the finite dimensional marginal dJ.'s of Xn one needs the
following.
Lemma 2.2.3. Let 1 :s; Sl < Sz < ... <
variables with common df F. Then,
xd
Sk
(2.2.10)
75
(2.2.11)
and hence
P{Xn:n::;; t} = Fn(t) = F:(min(t 1 , ... ,td)).
(2.2.12)
Fn(C ntl
(ii) (Independence)
Secondly, assume that the components '11' ... , '1d of E; are independent.
Then it is clear that X~~!, ... , X~~~ are independent. If GiUl,,,Ul is the dJ.
of'1j then with Cn,j and dn,j as in (1.3.13):
F n(C n,l t l
n
j=l
d
GiW,,,(j)(tj ).
(2.2.13)
76
see that X1:n and Xn:n (and, thus, X~~~ and X~7~) are asymptotically
independent. Thus, again we are getting independent r.v.'s in the limit.
Contrary to the univariate case the multivariate extreme value d.f.'s form
a non parametric family of distributions. There is a simple device which enables
us to check whether a given dJ. is a multivariate extreme value dJ.
We say that ad-variate dJ. Gis nondegenerate if the univariate marginals
are nondegenerate. A nondegenerate d-variate dJ. G is a limiting dJ. of sample
maxima if, and only if, G is max-stable, that is,
(2.2.14)
for some normalizing constants an j > and bn,j (compare e.g. with Galambos
(1987), page 295, or Resnick (1987), Proposition 5.9).
If ad-variate dJ. is max-stable then it is easy to show that the univariate
marginals are max-stable and, hence, these dJ.'s have to be of the type G 1 ,a,
G2 ,a or G3 with r:J. > 0.
On the other hand, if the jth univariate marginal dJ. is Gi(j),aU) for
j = 1, ... , d, one can take the normalizing constants as given in (1.3.13) to
verify the max-stability.
Again the transformation technique works: Let G be a stable dJ. with
univariate marginals GiU),aU) for j = 1, ... , d. Writing again Ii,a = Gi~; 0 G 2 ,l
we obtain that
Xl
(2.2.15)
defines a stable dJ. with univariate marginal d.f.'s G2 ,l (the standard exponential dJ. on the negative half-line).
EXAMPLE
(i)
( X'Y)
X,y < 0,
Jr
X,
y < 0,
(2.2.16)
[0,1]
J[O,l]
udv(u) =
J[O,l]
(1 - u)dv(u) = 1.
(2.2.17)
77
Recall that the marginals are given by G1 (x) = limy~oo G(x,y) and G2 (y) =
G(x, y) and hence (2.2.17) immediately implies that, in fact, the marginals in (2.2.16) are equal to G2,1.
lf v is the Dirac measure putting mass 2 on the point t then G(x, y) =
exp(min(x, y. lf v is concentrated on {O, 1} and puts masses 1 on the points
o and 1 then G(x, y) = G2 1 (X)G 2 1 (y).
The transformation technique immediately leads to the corresponding
representations for marginals different from G2 l ' Check that e.g.
limx~oo
G(x,y) = exp ( -
max(ue-X,(1 - U)e-Y)dV(U)
(2.2.18)
J[O.l]
Multivariate D.F.'s
This section will be concluded with some general remarks about multivariate
dJ.'s.
First recall that multivariate dJ.'s are characterized by the following three
properties:
(a) F is right continuous;
that is, if Xn ! Xo then F(xn) ! F(xo).
(b) F is normed;
that is, if Xn = (xn.l>"" x n d ) are such that x n .; i 00 for every i = 1, ... , d
then F(xn) i 1; moreover, ifx n ;;:: Xn+l and x n.;! -00 for some i E {I, ... , d}
then F(xn) -+ 0, n -+ 00.
(c) F is A-monotone;
that is, for all a = (ai' ... , ad) and b = (b 1, .. . , bd),
A~F :=
(2.2.19)
me{O.l}d
Lemma 2.2.6. Let F be ad-variate df. with univariate marginal df.'s F;,
i = 1, ... , d. Then, for every x, y,
d
IF(x) - F(y)1
:$;
L IF;(x;) ;=1
F;(Y;)I
(2.2.20)
78
PROOF.
We get
if Xi:<=;; Yi
Xi > Yi
IF(x) - F(y)1 = lit [F(Yl,,Yi-l,X i,,Xd) - F(Yl, ... ,Yi,Xi+l, ... ,Xd)]1
:<=;;.f
,=1
]=,+1
(-OO,XJ))
:<=;;
i=l
IFi{X;) - Fi{Y;)I
~;
distribution of the independent random vectors (1]r;_, +1' ... ' 1]r;-I), i E I, where for
every i E I the components of the vector are i.i.d. random vectors with common
distribution equal to the distribution of ~1 truncated to {x: Z;-1 < IIx - Xo liz < z;}
with Zo = 0 and Zk+l = 00.
2. (Distribution of IjI-order statistics)
(i) Prove the analogue of (2.1.21) for the kth IjI-order statistic Xk,n.
(ii) (Problem) Derive the asymptotic distributions of central and extreme IjI-order
statistics Xk,n.
(iii) (Problem) Derive the asymptotic distribution of the trimmed mean in (2.1.22)
for different centering random vectors ~o.
one gets
(i)
and
(ii)
1, ... , n
(iii)
{.
l~l
lA
79
m} :0;;:2': . (j)(
-1)j-mSj
m
if
J~m
keven
k - m odd.
4. (i)
U Ai = L (-ly- 1 Sj'
i=1
j=l
<
k
k
Pi~ Ai; j~
(-ly- 1 Sj
if
odd
even.
L (_ly+l hit)
j~l
hit)
=
1
~il
(ii) Moreover,
1 _ F(t) :0;;
(_l)j+l hit)
:2': j~l
k
k
if
= 1, ... , d.
odd
even.
(iii) Find C > 0 such that for every positive integer n and x
[0,1],
(~ (-I)jnhit))
if k even or k
= d.
(tl
(-l)jnhit)) - Cn- 1
if k odd or k
= d.
+ 4n(n -
F(x, y) =
{2XY
2xy - (x
+y-
1)2
if x
X
+ Y:0;; 1
+ y :2': 1
+y
0:0;; x, y :0;; 1.
80
The density 1;n.n),n of (X~~~, X~7~) is given by
1;n.n),n(x,y)
nFn- 1 (x,y)f(x,y)
+ n(n -
I)F n- 1 (x,y)(xy
+ x 1/2)(xy + y1/2)
9.
B}
nFn-l(x,y)dF(x,y)
(i) Prove that a bivariate extreme value dJ. G with standard "negative" exponential marginals (see (2.2.16)) can be written
G(x,y)
exp[(x
+ Y)dC:
y)
x,y < O.
J10.1J
10. Ad-variate d.f. with marginals G1 1 is max-stable if, and only if,
G(x) = exp(L min(Ulxb ... ,UdXd)dfl(U))
{U: ,=1
.f
uidfl(U) = 1
Ui
= 1,
Ui
2':
o}
for i = 1, ... , d.
~ w' ~)::; t} =
exp[td(w)].
(ii) Let ('11.i, '11)' i = 1, ... , n, be i.i.d. random vectors with common dJ. G as given
in P.2.9(i). Define
Bibliographical Notes
81
In(w) =
[n- f min(~,
l'1d)]-l
1- w w
1
i=l
and
Variance(1/J.(w)) = 1/(nd(w)2).
12. (Multivariate transformation technique)
Let ~ = (~1"'" ~d) be a random vector with continuous dJ. F. We use the notation
Fklxi-1, .. ,xd = P(~i S; 'l~i-1 = Xi-1'''''~1 =
xd
,xd)
Prove that T1 @, ... , Jd(~) are i.i.d. (0, 1)-uniformly distributed r.v.'s.
(ii) Define y-1(q) = (Sl(q),,,,,Sd(q)) by
Sl(q) = Fl1(qd
Si(q) = F i- 1(q;lSi_1 (q), ... , Sl (q))
for i = 2, ... , d.
Prove that P{T-1(T(~)) =~} = 1. Moreover, if '11' ... , '1d are i.i.d. (0,1)uniformly distributed r.v.'s then
T- 1('11'"'' '1d)
P{X.,. =
~j
for some j
Bibliographical Notes
It is likely that Gini and Galvani (1929) were the first who considered
the bivariate median defined by the property of minimizing the sum of the
deviations w.r.t. the Euclidean norm (see (2.1.11)). This is the "spatial" median
as dealt with by Oja and Niinimaa (1985). In that paper the asymptotic
performance of a "generalized sample median" as an estimator of the symmetry
center of a multivariate normal distribution is investigated. Another notable
article related to this is Isogai (1985).
The result concerning the conditional distribution of exceedances (see
(2.1.21)) and its extension in P.2.1 was e.g. applied by Moore and Yackel (1977)
and Hall (1983) in connection with nearest neighbor density estimators;
however, a detailed proof does not seem to exist.
A new insight in the asymptotic, stochastic behavior of the convex hull of
82
data points is obtained by the recent work of Eddy and Gale (1981) and
Brozius and de Haan (1987). This approach connects the asymptotic treatment
of convex hulls with that of multivariate extremes (w.r.t. the marginal
ordering).
For a different representation of the density of multivariate order statistics
we refer to Galambos (1975).
In the multivariate set-up we only made use of the transformation
technique to transform a multivariate extreme value dJ. to a dJ. with
predetermined margins. P.2.12 describes the multivariate transformation
technique as developed by Rosenblatt (1952), O'Reilly and Quesenberry
(1973), Raoult et al. (1983), and Riischendorf (1985b). It does not seem to be
possible to make this technique applicable to multivariate order statistics
(with the exception of concomitants).
Further references concerning multivariate order statistics will be given in
Chapter 7.
CHAPTER 3
el' ... ,
84
(3.1.1)
Eer.
ei
nl/2
< -e}
(
P { ---;-(Ur:n - Jl) ; e
~ exp
where Jl
r/(n
3(1
e2
+ e/(O'nl/2
(3.1.2)
- Jl).
PROOF. (I) First, we prove the upper bound of P{(nl/2/0')(Ur : n - Jl) ~ -e}.
W.l.g. assume that a: = Jl - eO'/nl/2 > O. Otherwise, the upper bound in (3.1.2)
is trivial. In particular, a: E (0, 1). By (1.1.8), putting eo = (r - na:)/(na:(1 - a:1/2
and i = 1(-OO,~I('1;) - a:, we get
P{(nl/2/0')(Ur : n - Jl)
-e} =
p{~ 1(-OO,~I('1i) ~ r}
= p{~
ei ~ r -
na:}
~ exp( -eot + it 2)
ei
if 0 ~ t ~ (na:(1 - a:1/2 where the last step is an application of (3. 1.1) to and
e = eo. It is easy to see that t = 2e(a:(1 - a:1/2/(30'(1 + e/(O'nl/2))) fulfills the
condition 0 ~ t ~ (na:(1 - ClW/2. Moreover, -eot + (3/4)t 2 ~ -e 2/(3(1 + e/
(O'nl/2))) since eo ~ eO'/(a:(1 - a:1/2 and a:(1 - a:)/0'2 ~ 1 + e/(O'nl/2). This
proves the first inequality.
d
(II) Secondly, recall that Ur : n = 1 - Un- r +l :n (see Example 1.2.2), hence
we obtain from part (I) that
P{(nl/2/0')(Ur : n - Jl)
e}
Jl) ~ e}
exp( -e 2/3(1
+ e/(O'nl/2))).
+ 1)/(n + 1 ~
-e}
85
The right-hand side of(3.1.2) can be written in a simpler form for a special
choice of e. We have
S;
2n- s . (3.1.3)
e~
o.
(3.1.4)
P{Ur:n
PROOF.
S;
J.Le}
S;
e 1/'(ee)'/(2nr)1/2.
P{Ur:n
S;
= (exp(r
f:
JorilE
X,-l
dx
X,-l(1 -
S;
xr'dx
(r' /r!)e'
+ ()(r)/r)/(2nr)1/2)e'
J.L)
~ :~;e)}
(3.1.5)
86
g(ll)
h(x)
= x
(3.1.6)
If one needs an upper bound of the left-hand side of(3.1.5) for a fixed sample
size n then one has to formulate the smoothness condition for F in a more
explicit way so that the capital 0 in (3.1.6) can be replaced by a constant. This
should always be done for the given specific problem.
PROOF.
X i - 1 (1-
G(x))dx
so that, by writing G(x) = P{(n 1/2/(J)IU"n - Ill::s; x}, the exponential bound
in (3.1.4) applied to 1 - G(x) yields
E l(n 1/2 /(J)(U"n - IlW =
La) xi dG(x)
=j
La) x i -
::s; 2j
La) x i -
(1 - G(x))dx
87
Then there exists a constant C > such that for every real u and integers n,
k and r E {I, ... , n} with 1 ~ i:= r - ks ~ m:= n - (j + l)k the following two
inequalities hold:
PROOF. We shall only verify the upper bound of E(IXr:nlk 1{X r ,n>U})' The other
inequality may be established in a similar way.
Since X"n ~ F-1(V"n) and F-1(q) > u iff q > F(u) we get
e 1F-
b(r, n - r + 1) JF(U)
b(r, n -
(xWx r - 1 (1 _ xrr dx
1 (Ir1(x)lxS(1 _ x).i-s+1)k
r
r + 1) JF(U)
b(i, m - i
~ b(
r, n - r
+ 1) k
+ 1) C P{Vi:m >
_ x)m-idx
X i-1(1
F(u)}
where C is the constant of (1.7.11). Since P{Vi:m > F(u)} = P{Xi:m > u} the
proof is complete.
0
P {I C;;l (q) - ql > (log n)jn) 1/2 K(q, s, n) for some q E (0, I)}
where K(q,s,n)
PROOF.
(7(s
such that
~
B(s)n- S
+ 1)(logn)/n})1/2.
By (3.1.3)
P {I Cn- 1(q) - q I > (log n)jn) 1/2 K(q, s, n) for some q E (0, I)}
2n- s
88
p{
nl/2IG;1(q)_ql
max{(q(l - q))1/2, ((logn)/n)1/2}
> C(s)(log n)1/2 for some q E (0,
1)} : ; B(s)n-S
(3.1. 7)
p{
O<P~~~2<1
I(F- l ),(p)l
Then, for every s > 0 there exist constants B(s, 8) > 0 and C(S,8) > 0 (only
depending on sand 8) such that
(i)
nl/2IFn-1(p) - F-l(p)1
Q,S,PS,Q2 max{(p(1 - pW/2, ((logn)/n)1/2}
P { sup
and if, in addition, the derivative (F-l)' satisfies a Lipschitz condition of order
+8
89
(ii)
p{
sup
Q,5.P,5.P25.Q2
= F- l (P2 + y(G;;1(P2) -
P2)) - F-l(Pl
+ y(G;;l(pd -
pd)
From the proof of Lemma 3.1.7 it is obvious that (i) still holds if F- l satisfies
a Lipschitz condition of order 1.
90
i~ gi,y(X) 1 ~ C(x)h(yy+1,
YE
r,
(3.2.1)
c:
A if sup{ C(x):
Igy(X) -
xEAo}<oo.
If sup {h(y): y E r} <
00,
m-l j<i)(yo)
'1
- i~O -i-!-(y - Yo)' ~ Cly - Yolm
1 - <I>(y)
<p(y) (
=-
91
1+
m-1
i=1
+(-lr3'5"'(2m-l)
<p(y)
-dy
y2m
oo
(3.2.2)
for every positive integer m and y > 0 (where I?=1 equals zero by convention).
An application of (3.2.2) in the cases m = 1 and m = 2 leads to
(3.2.3)
for y > O.
By means of (3.2.2) we get an expansion of (1 - <I>(y))y/<p(y) in powers of
h(y) = y-2. We have
1
(1 -7jY))y - (1
<p y
,=1
1)) 1 ~ Cmy-2m.
(3.2.4)
s~p
Hn(t) - ( <I>(t)
+ <p(t)
11
n- i/2 Li(t)) 1
~ Cn- ml2
(3.2.5)
Lemma 3.2.4. Let Py and Po, y be probability measures and let Vi, y be finite signed
measures on a measurable space (S, I?4).
If PO,y + Ii=11 Vi,y is an expansion of Py, y E r, uniformly over I?4 arranged
92
in powers of hey), y E r, then there exists an expansion PO,y + L~=11 Ili,y such
that Ili,y are finite signed measures with lli,y(S) = O. Moreover, one may take
i
= 1, ... , m - 1
or
i
(3.2.6)
= 1, ... , m - 1.
(3.2.7)
PROOF. Straightforward by using the fact that II{=l Vi,y(S)1 ::;; Ch(y)j+1 for
0
some constant C > O.
According to Lemma 3.2.4. we can assume w.l.g. that the term Vi, y of an
expansion has the property Vi, yeS) = O. Another useful tool in this context is
the following.
Lemma 3.2.5. Let Py, PO,y and Vi,y be as in Lemma 3.2.4. Suppose that there exists
C > 0 such that
sup 1 Py(B) - Po,y(B)
1+
BE:J6
I/L
(3.2.8)
and IvijS) I ::;; Ch(y)i for every j = 0, ... , m - 1 and y E r [where (3.2.8) has to
hold whenever 1 + L{=l Vi,y(S) > 0].
Then, PO,y + L~=11 Ili,y, Y E r, is an expansion of Py, y E r, uniformly over f!J
arranged in powers of hey) where Ili,y is inductively defined by
i-1
i = 1, ... , m - 1.
(3.2.9)
PROOF. First notice that from the inequality IVi,y(S)1 ::;; Ch(y)i it is immediate
by induction over i = 1, ... , m - 1 that
(3.2.10)
'+1
::;; Ch(yy+1
"j
(PO,y + i~')1
Ili,y
(1 + (1 + i~
-1)
since
2j
i-1
L Vk,y(S)Ili-k,y'
k=l
i=j+1
Vi,y(S))
93
Thus, the assertion is proved for those y for which h(y) is sufficiently small.
By (3.2.10) again it can easily be seen that, otherwise, the assertion trivially
holds by choosing the constant C sufficiently large.
0
By induction over i = 1, ... , m - 1 it is easy to see that the signed measures
Ili.y in Lemma 3.2.5 already fulfill the condition lli,Y(S) = 0,
Expansions of D.F.'s
An expansion of probability measures which holds uniformly over all measurable sets on the real line yields an expansion
m-1
PO,y( -00, tJ
+ L
i=l
Vi,y( -00, tJ
of dJ,'s.
Assume that PO,y = N(o, 1), Vi,y has a density <pRi,y where Ri,y is a polynomial
and the mass of Vi, y is equal to zero, Then, the expansion of the d.f.'s can always
be written in the form
m-1
<D(t)
+ <p(t) L
Ly,i(t)
i=l
where L y , i are polynomials, This is immediate from the following lemma which
yields that one can find polynomials Ly,i such that (<pLy,;)' = <pRY,i'
Lemma 3.2.6. For every positive integer k,
<p(X) (X2k where ak = 1 and a i = (2i
+ l)ai+1' i =
(3,2,11)
1"", k - 1. Secondly,
<p(X)X 2k - 1 = [ - <p(X)
it
a ix 2(i-1)]'
(3.2.12)
PROOF.
a 1 = 1 35 ... (2k - 1) =
X2k<p(X) dx.
(3.2.13)
Two further technical lemmas that provide the basic tools for proving
expansions for extreme and central order statistics will be given in Appendix 2.
94
B} - P{I]
(3.3.1 )
B}I.
BeiJ6
In this sequel, we shall write SUPB in place of sUPBeiJ6. Let Qo and Q1 denote
the distributions of ( and 1]. Then, we write
(3.3.2)
(3.3.3)
PROOF.
Check that
J{Jo>fd
(/0 -
Id df1 ~ 2(Qo(B) -
Q1 (B)) = 2- 1
PJ,
Q1 (B))
95
f" -: fo
PROOF.
f"
Ifn -
(1)
Moreover, fo ;;::: (fo - f,,)+ ;;::: 0 and (fo - fn)+ -: 0 J.l - a.e. Therefore, the
dominated convergence theorem implies that
A short look at the proof above reveals also that the following extension
holds.
f f
and
li:,nf" = fo
J.l - a.e.
then
f"
f
f f
fol dJ.l = o.
(i)
li:,n
(ii)
Ifn -
96
and for every subsequence i(n) there exists a subsequence k(n) = i(j(n such that
limfk(n) = fo
n
J.l - a.e.
(iii) For every subsequence i(n) there exists a subsequence k(n) = i(j(n)) such
that
and
lim inffk(n)
fo
J.l - a.e.
Jl I.f~(n) -
fol dJ.l
00
fn~l
00.
li~
Condition (iii) implies that there exists k(n) = i(j(n)) such that
lim (/0 - A(n)t = 0
n
J.l - a.e.
Thus, by repeating the arguments of the proof of Lemma 3.3.2 we obtain the
0
desired conclusion.
The following version of the SchefTe lemma will be particularly useful in
cases where the measurable space varies with n.
Lemma 3.3.5. Let gn and f" be nonnegative, measurable functions. Assume that
1, 2, 3, ... is a bounded sequence, and that limn (gn - f,,) dJ.ln = O.
Then the following three conditions are equivalent:
Jgn dJ.ln, n =
(i)
li~
(ii)
li~ f If,,/gn -
(iii)
lim [
n
11 gn dJ.ln = 0,
J{lfn/gn-ll~'}
97
Jr{lJnlgn-11~.} gndP,.n ~ e-
flfn/gn - IlgndP,n
~ e-
f lgn - fnldP,n.
(iii)::;. (i): For e > 0 put B = B(n, e) = {gn > 0, Ifn/gn - 11 < e}. If (iii) holds
then
JB
Itfn dP,n -
gn dP,n
I~
(1)
Ifn/gn - 11 gn dP,n
(2)
L!..dP,n
~ f gndP,n -
e - e L gndP,n
~ f!..dP,n -
2e - e L gndP,n
(3)
fl!.. -
gnldP,n
~L
Ifn - gnldP,n
+f
+ ffndP,n - t!..dP,n
gndp,n - L gn dP,n
~ 2e f gndP,n + 3e
Finally, Lemma 3.3.5 will be formulated for the particular case of probability
measures.
Corollary 3.3.6. For probability measures Qn and Pn with p,n-densities !.. and gn
the following two assertions are equivalent:
li~ f
(i)
(ii)
lim Pn{I!../gn - 11 ~ e}
n
98
EXAMPLE
Q~II = 1 - t- k
= [f (fdfo
J/2
K(QO,Q1) = f(-IOgf1/fo)dQo.
"Hellinger distance"
"x 2 -
distance"
99
H(Qo,Qd=[2(1- fUofdl/2dll)J/2.
(3.3.5)
(3.3.6)
H(Qo,Qd::;; D(QO,Ql)
PROOF.
1IQ0 - Qlll
f 1/0 -
III dll = r
f 1/01/2 - 1/1211/01/2
+ /l/21 dll
+ 1/12)Z dll J /2
+ /l/2)]2 dQo
0
Note that (3.3.7) does not hold if the condition that Ql is dominated by Qo
is omitted. Without this condition one can easily prove (use (3.3.5)) that
H(Qo,Qd::;; [2D(Qo,Ql)]1 /2.
Under the condition of Lemma 3.3.9 it is clear that IIQo - Qll1 ::;; D(QO,Ql)
This inequality can slightly be improved by applying the Schwarz inequality
to 11 - III dQo We have
(3.3.8)
Another bound for the Hellinger distance (and thus for the variational
distance) can be constructed by using the Kullback-Leibler distance. This
bound is nontrivial if Qo is dominated by Qt. We have
(3.3.9)
A modification and the proof of this result can be found in Appendix 3.
The use of the Kullback-Leibler distance has the following advantages: If
Idlo is the product of several terms, say, gi then we get an upper bound of
10gUdlo) by summing up estimates oflog(gJ Moreover, it will be extremely
100
k) :$; exp [r k
k D(Qi' pY )1/2 .
1 i~ D(Qi' pY ] (. i~
D ( i~ Qi' ~ Pi
PROOF.
Ad (i): Suppose that Qi and Pi have the Ilcdensities}; and gi. By (3.3.5),
2[1- J[D
2[1 - D
D
i~ lli}X 1, ... ,X
k )]
J(};gY/2dll ]
(1 - 2- 1H(Qi,PY)]:$; it H(Qi'PY
2[1 -
n (1 k
u;) ~ 1 -
i=l
Lu
i=l
D+
[1
D(Qi,P;)2] - 1
:$;
:$;
101
I ~ Qi - ~ Pi
II
:0;;
i~ H(Q;,PY
(3.3.10)
IIQ k
pkll
:0;;
kllQ - PII,
(3.3.11)
and by (3.3.10),
(3.3.12)
Thus, if IIQ - PII and H(Q, P) are of the same order (Example 3.3.8
treats an exceptional case where this is not true) then (3.3.12) provides a
more accurate inequality than (3.3.11). From (3.3.1 0) it is obvious that also
IIQk - pkll :0;; k I12 D(Q,P). A refinement of this inequality will be studied in
Appendix 3.
TQ(B)
Q{TE B}.
Thus, in this context, the symbol T also denotes a map from one family of
probability measures into another family.
The following result is obvious.
Lemma 3.3.12.
IITQ - TPII
IIQ - PII
102
PROOF. We repeat in short the arguments in Pitman [1979, (2.2)]. Let go and
10 be J.L-densities of Q and P where w.l.g. J.L is a probability measure. If gl 0 T
and 11 0 T are conditional expectations of go and 10 given T (relative to J.L)
then gl and 11 are densities of TQ and TP w.r.t. TJ.L.
Thus, by applying the Schwarz inequality for conditional expectations [see
e.g. Chow and Teicher (1978), page 215] to the conditional expectation of
(goIo)I/2 given T we obtain in a straightforward way that
(goIo)I/2 dJ.L::;;
(gJl)I/2d(TJ.L)
PROOF. Check that (Id 2 dTQ :s; (10)2 dQ where 10 is a Q-density of P and
11 is a TQ-density of TP. Moreover, use arguments similar to those in the
proof to Lemma 3.3.13.
0
<
P{Xn:n :s; x} - 1 +
;;:::
with
Six)
lSi! <"'<ijsn
L (-lYSix)
k
j=l
if
kodd
k even
= 1, ... , n.
3. Prove that
N(llloaI)(B) - N(llo.a5)(B) = (aO/a 1
+ ((Jl1
1)
+ O[((Jl1
Is
Jlo)/a~) Is (x - Jlo)/ad
+ (ao/a 1 -
1)2].
103
lIP. - Po II
-+
0, n -+
OCJ
iff p. -+ Po weakly.
(see Ibragimov, 1956, and Reiss, 1973)
5. Let Vi and V2 be finite signed measures on a measurable space (S, ~). Let JI be a
system of [0, 1)-valued, ~-measurable functions defined on S.
(i) Define
f7 = {.p-l(t, 1]: t
Then,
(ii) As a special case we obtain for the system JI of all ~-measurable, [0, 1]-valued
functions that
(iii) If JI is the system of all [0, 1)-valued, unimodal functions on the real line then
sup
t/le.At
If"'dV
l -
f"'dV21 =
Fl (t), Fl
teAm
7. Prove that
sup
B
(n - Fo(C)).
=F e2}'
(i)
(Reiss, 1980)
9. (Jensen inequality)
Let h be a convex function on an open interval I and
that and h(e) are finitely integrable. Then,
104
p{ sup
nl/2IG;;1 -
qe(O.I)
ql > e} =
p{ sup
qe(O.I)
n 1/2 IG. -
Bibliographical Notes
This chapter is not central to our considerations and so it suffices to only
make some short remarks.
Exponential bounds for order statistics related to (3.1.2) have been discovered and successfully applied by different authors (e.g. Reiss (1974a, 1975a),
Wellner (1977)).
The upper bound for the variational distance using the Kullback-Leibler
distance was established by Hoeffding and Wolfowitz (1958). In this context
we also refer to Ikeda (1963, 1975) and Csiszar (1975). The upper bound for
the variational distance between products of probability measures by using
the variational distance between the single components was frequently proved
in various articles, nevertheless, this inequality does not seem to be well
known. It was established by Hoeffding and Wolfowitz (1958) and generalized
by Blum and Pathak (1972) and Sendler (1975). The extension to signed
measures (see Lemma A.3.3) was given in Reiss (1981b). Investigations along
these lines allowing a deviation from the independence condition are carried
out by Hillion (1983).
PART II
ASYMPTOTIC THEORY
CHAPTER 4
Approximations to Distributions of
Central Order Statistics
Under weak conditions on the underlying dJ. it can be proved that central (as
well as intermediate) order statistics are asymptotically normally distributed.
This result easily extends to the case of the joint distribution of a fixed number
of central order statistics. In Section 4.1 we shall discuss some conditions
which yield the weak and strong asymptotic normality of central order
statistics.
Expansions of distributions of single central order statistics will be established in Section 4.2. The leading term in such an expansion is the normal distribution, whereas, the higher order terms are given by integrals of
polynomials W.r.t. the normal distribution. These expansions differ from the
well-known Edgeworth expansions for distributions of sums of independent
r.v.'s in the way that the higher order terms do not only depend on the sample
size n but also on the index r of the order statistic. In the particular case of
sample quantiles the accuracy of the normal approximation is shown to be of
order 0(n-1/2).
In Section 4.3 it is proved that the usual normalization of joint distributions
of order statistics makes these distributions asymptotically independent of the
underlying dJ. This result still holds under conditions where the asymptotic
normality is not valid.
In Section 4.4 we give a detailed description of the multivariate normal
distribution which will serve as an approximation to the joint distribution of
central order statistics.
Combining the results of the Sections 4.3 and 4.4, the asymptotic normality
and expansions of the joint distribution of order statistics X'l :n' .. , X'k: n
(with 0 = ro < r 1 < ... < r k < rk+1 = n + 1) are proven in Section 4.5. It is
shown that the accuracy of this approximation is of order
108
o ( i~
(ri - ri_d- 1
)1/2
under weak regularity conditions. These approximations again hold w.r.t. the
variational distance.
Some supplementary results concerning the dJ.'s of order statistics and
moderate deviations are collected in the Sections 4.6 and 4.7.
n --+
(4.1.1)
00,
for every t where a"n = (r(n - r + I1/2j(n + 1)3/2 and b"n = rj(n + 1).
Since <I> is continuous we also know that the convergence in (4.1.1) holds
uniformly in t. In this sequel, we prefer to write a(n) and b(n) instead of a,(n),n
and b,(n),n, thus suppressing the dependence on r(n).
If (r(n)jn - q) = o(n-1/2) for some q E (0, I)-a condition which is e.g.
satisfied in the case of sample q-quantiles-another natural choice of the
constants a(n) and b(n) is a(n) = (q(I - q))1/2 jn 1/2 and b(n) = q.
Applying (1.1.8) we obtain
P{a(n)-1(U,(n):n - b(n)) ~ t}
=P{ -
+ np(n, t)
(4.1.2)
P{a'(n)-1(Xr(n):n - b'(n)) ~ t}
= P{Ur(n):n
109
+ ta'(n))}
+ 0(1) = <I>(t) + 0(1)
~ F(b'(n)
(4.1.3)
+ ta'(n)) -
b(n)] ~ t,
n~
00.
-1}
(q)) ~ t ~ <I>(t),
n~oo,
= (q(1
n 1/2F'(F-1(q)) -1
-1
}
P { (q(l _ q))1/2 (Fn (q) - F (q)) ~ t ~ <I>(t),
(4.1.4)
- q))1/2jn 1/2,
n~
00.
(4.1.5)
(n
P{
+ 1)3/2f(F-1 (~))
n+1
(r(n)(n - r(n) + 1))1/2 (Xr(n):n -
-1
r(n)
(n +
1)) ~
<I>(t),
}
t
n~
00,
(4.1.6)
for every t. The proof is straightforward and can be left to the reader.
When treating intermediate order statistics the underlying dJ. F has to
satisfy certain regularity conditions on a neighborhood of IX (F) or w(F). From
this point of view intermediate order statistics are connected with extreme
order statistics. The extreme value theory will provide conditions better
tailored to this situation than those stated in Example 4.1.2 (see Theorem
5.1.7).
110
n .....
00,
(4.1.7)
for every t = (t l' ... , tk) where <1>1; is the df. of the k-variate normal distribution
with mean vector zero and covariances qi(1 - %) for 1 ~ i ~ j ~ k. As a special
case we have
n .....
00.
(4.1.8)
n .....
00,
(4.1.9)
n .....
00.
Next, the problem arises to extend (4.1.9) to a certain class of dJ.'s P. This
is again possible by using the transformation technique.
Theorem 4.1.4. (i) Let q E (0, 1) be fixed. Assume that P has a derivative, say,
f on the interval (F-1(q) - e, p-1(q) + e) for some e > O. Moreover, assume
that f is continuous at P-1(q) and that f(P- 1(q)) > O. Then, if r(n)/n ..... q as
n .....
00,
111
sup
B
n --+
(ii) Moreover,
00.
(4.1.10)
00.
(4.1.11)
where the summation runs over all positive integers i. By verifying the
conditions of Example 4.1.1 we shall obtain that the dJ.'s ofthe standardized
sample medians weakly converge to the standard normal dJ. <1>. Since
n
2i
+ 1 (1
i~ T+1
it is easily seen that f f(x) dx =
F(2n
2i - 2i
1)
+ 1 = 2(n + 1)
(4.1.12)
1. By (4.1.12),
+ 1)
2(n
1)
2 1+
1)
2n+l(
if
+ ~ x - 2n + 1
XE[2(n~ 1)'2n~ IJ
XE[2n~ 1'21nJ
This implies that x - x 2 ::;:; F(x) - 1/2::;:; x for Ixl ::;:; 1/2 showing that F is
differentiable at F-1(1/2) = 0 and F(l)(O) = 1. Thus, by Example 4.1.1,
P{2n1/2 X[n/2]:n ::;:; t}
-+
<I>(t),
n -+
00,
for every t,
112
P{2nl/2 X[n/21:n
1,
(4.1.13)
(4.1.14)
s~p iP{a(n)-l(Y,,(n):n -
b(n E B} -
we have
s~p iP{a'(ntl(Xr(n):n -
b'(n E B} -
t
t
h(X)dxi
~ 0,
n~
g(X)h(G(Xdxi
~ 0,
00,
(4.1.15)
n~oo,
(4.1.16)
+ xa'(n))) - b(n)]
are
(a) strictly increasing and absolutely continuous on intervals (oc(n), p(n where
oc(n) ~
-00
and p(n) ~
00,
and
x.
113
PROOF. Write T,,(x) = a'(nt 1[Fl1(Fo(b(n) + xa(n))) - b'(n)]. Since Fo is continuous we obtain from Corollary 1.2.6 that
P{a'(n)-l(Xr(n):n - b'(n E B} = P{T,,[a(nt1(,.(n):n - b(n))] E B}
s~p Ip{a'(nt1(Xr(n):n -
b'(n
B} -
9(X)h(G(Xdxl
~suplf
h(x)dx- r 9(X)h(G(Xdxl+o(n
B {~E~
JB
(4.1.17)
O).
The image of (ct(n), f3(n under Sn' say, I n is an open interval, and T"IJn is the
inverse of Snl(ct(n),f3(n. By P.t.11,
J{T"EBj
h(x) dx =
r hn(x) dx
(4.1.18)
JB
for every Borel set B c (ct(n), f3(n where hn = S~(h 0 Sn) l(a(n).p(n))' Notice
that w.l.g. S~ can be assumed to be measurable. Since Jhn(x) dx ~ 1 and
hn --+ g(h 0 G) almost everywhere the SchefTe lemma 3.3.2 yields
s~p
II
hn(x)dx -
n --+
g(X)h(G(Xdxl--+o,
00.
(~E~
h(x) dx -
JB
n --+
g(x)h(G(x dx 1--+ 0,
00.
(4.1.19)
Whereas the constants a(n) and b(n) are usually predetermined the constants a'(n) and b'(n) should be chosen in a way such that Sn fulfills the required
conditions. If G(x) = x and g(x) = 1 (that is, thelimiting expressions in (4.1.15)
and (4.1.16) are equal) then a natural choice of the constants a'(n) and b'(n) is
b'(n) = Fl1(Fo(b(n)
and
a'(n) = a(n)/(Fo-l
Fd(b'(n.
(4.1.20)
yields
PROOF OF THEOREM 4.1.4. We shall only prove (4.1.10) since (4.1.11) and (iii)
follow in an analogous way.
Lemma 4.1.6 will be applied to Fo being the uniform dJ. on (0, 1), Fl = F,
a(n) = (r(n)(n - r(n) + 11/2/(n + 1)3/2, b(n) = r(n)/(n + 1), h = qJ, 9 = 1 and
G(x) = x. (4.1.15) holds according to (4.1.9). Moreover, choose b'(n) =
F-l(b(n and a'(n) = a(n)/f(b'(n. Since f is continuous at F-l(q) and
f(F-1(q > we know that f is strictly positive on an interval (F-l(q) - K,
114
r (1 + ~f
,=1 L
JB
i r n)dNc.o.1)1
(4.2.1)
a;:!(Ur:n - br n) = ((ex
+ P)312/(exP)112)(Ur:n -
ex/(ex
+ P))
leXp (X 2/2)g(x) -
(1
+ ~t:
hi)l:::; C[(a
115
(1)
for Ixl :::; [a/3/(a + /3)]1 /6 where hi are the polynomials as described in Corollary A.2.3. Define the signed measure v by
:::; C((a
+ /3)/(a/3))m /2
+ P{((a
:::; c((a
A} - v(A)1
+ /3)3 /2/(a/3)1/2)(Ur:n -
a/(a
+ /3)) B} + Ivl(B
l)
C)
+ /3)/(a/3)t I2 .
Addendum 4.2.2. The application of Lemma 3.2.5 in the proof of Theorem 4.2.1
gives a more precise information about the polynomials Li,r,n'
(i) The polynomials Li,r,n are recursively defined by
LI,r,n = hI,r,n -
f hI,r,n dN.
(0,1)
i-l
- k~l
and
1
L2 r n(x) = (
1)(
1) [en - 2r
..
rn-r+
n+
[7(n - 2r
+ If + 3r(n -
+ 1)](x4 -
+ 1)
26
(x - 15)/18 -
3)/12 - (n - r
+ 1)2(x 2 -
1)].
116
L 1 ,r,n dN(o, 1)
= 0,
so that for symmetric sets the normal approximation is of order O(n/r(n - r)).
(c) Numerical calculations show that for n = 1, 2, ... , 250 we can take
C1 = .14 and C2 = .12 in Theorem 4.2.1.
Ixl <
e,
(4.2.3)
117
Then there exist constants C > 0 and dE (0, 1) [which only depend on m]
such that
(i) S is strictly increasing on the interval I = ( - de, de).
(ii) For every monotone, real-valued function T such that the restriction of T
to the set S(l) is the inverse of the restriction SII we have
sup [
B
,=1
J{TEB)
JB
,=1
J[
~ Cexp(-me)
(4.2.4)
and
L 2 (x) = R 2 (x)
x 3 /2)R 1 (x)]
+ IXi[x6/8 -
5x 4 /8]
Since eP exp( - e) is uniformly bounded on [0, CfJ) for every p ;::: 1 there
exists d E (0, 1) such that
PROOF.
S'(x) ;::: 1 -
Ixl
i=l
~ de.
(1)
-de/2
and
(2)
S(de);::: de/2.
m-l
Xi+l
) I
Ixl +1
IS(x) - ( x + ;~ (i + I)! IX; ~ (m + 1)!IX
m
Ixl < e.
(3)
(4)
m,
r (1 + mf R )dN(o.I r h(x)dx
(5)
~~1 R;(S'(X).
(6)
J{TEB)
,=1
JB
where
h(x) = S'(x)(J)(S(x ( 1 +
118
I<p(S(x -
<p(x) (1
+ ~~1
wi(x)(S(x) - X)i) I
+ 8(S(x) -
x11 S(x) - xl
(7)
m
for Ixl ::; de and 8 E (0, 1). Moreover, Wi = <p(i)/(i!<p) is a polynomial of degree
::; i and C denotes a generic constant which only depends on m. For i = 1, 2
we get
W 1 (x) = -x
and w2 (x) = (x 2 - 1)/2.
Writing
m-1
tjJ(x) =
.~
,-1
Xi+1
(.
I
+ 1)'. (Xi'
Ih(x) -
<p(x) [1
+ tjJ(1)(x)] [1 + ~:
w;(x)tjJ(i)(x) ] [ 1
+ ~t:
Ri(X
+ tjJ(x ] I (8)
+ IxI 6 (m+1)2)
Ih(x) -
<P(x{ 1 +
me)(1
+ IxI 6 (m+1)2)
(9)
for Ixl < de where Li are polynomials which have the asserted property. From
(5) and (9) we deduce by integration that
If (1 ~f
: ; f.
{TEB}
,=1
Ri)dN(o.l) -
Ih(X) -
If (1 ~f
{TEB}
,=1
c (-
(1 + ~:
r (1 + ~f
JB
,=1
L i)dN(o.l)1
Li(X)<p(X)ldX::; Cexp(-me)
Ri)dN(o. 1)
(10)
fB (1 + ~f
L i)dN(o.l)1
,=1
c (-
(11 )
Note that Lemma 4.2.3 still holds if the condition that S has a continuous
derivative is replaced by the weaker condition that S is absolutely continuous.
119
+ xa r,n/f(F- 1(br,n))]
= a;::~(F[F-1(br,n)
- br,n)
+ 1)/(n +
I))}.
sup
B
Ip{a;::~f(r1(br,n))[Xr:n -
f (1
F- 1(br,n)] E B} -
+ ~f L;,r,n)dN(o,l)1
,=1
(4.2.5)
where L;,r,n is a polynomial of degree :=:;; 3i. Moreover, ai,r,n = S~~:l)(O) for
j = 1, ... , m - 1 and am,r,n = sup{IS~:':.+1)(x)l: x E Ir,n}.
PROOF. Throughout the proof, the indices rand n will be suppressed. Writing
(1)
IP{a- 1f(F:=:;;
(b))[Xr:n - F-1(b)]
B} -
{TEB}
(1 + ~f
R;)dN(O,1)1
,=1
C(n/r(n - r)t/2
I (1 + ~f R;)dN(O,
IJ{TEB)
1) -
,=1
:=:;; C [(n/r(n
- r))m/2
JB
(1 + ~f
,=1
L;) dN(o, 1)
I
(2)
)=1
120
o [(n/r(n -
r1/2
+ n;t~X l(Xj,r,nlijjJ.
j;l
(iii) For i = 1,2, we have (with Ri,r,n denoting the polynomials of (4.2.2,
L 1,r,n(x) = R 1,r,n(x) + (X1,r,n(X - x 3 /2)
and
L 2,r,n(x) = R 2,r,n(x) + (Xl,r,n[x 2R~,r,n(x)/2
+ (XL,n(x 6 /8
- 5x 4 /8)
+ (x -
+ (X2,r,n(x 2/2 -
x 3 /2)R 1,r,n(x)]
x 4 /6).
r (1 + ~f,;1 Li,r,n)d~o'l)/
JB
(4.2.6)
In particular, for i = 1, 2,
L1,r,ix) = (r(n - r
+ l)(n + 1)f1/2[(2n -
+ 2)x 3 /6 -
(n - r + l)x],
121
and
L 2,r,n(x) = R 2,r,n(X) + ((n - r + l)(n + lWl[r(-5x 6/24 + 15x 4 /8 -- 5x 2/2)
- (n
- 3x 2/2)]
G- 1(br,n) -log[(r(n - r
Using this
interval Ir,n' Moreover, by straightforward calculations we obtain rJ.m,r,n $;
C(n/r(n - r + l))m/Z where C is a universal constant. Thus, Theorem 4.2.4 is
applicable and yields the assertion.
0
Numerical computations show that one can take C1 = .15 and Cz = .12 in
Corollary 4.2.7 for n = 1, ... ,250. From the expansion oflength 2 in Corollary
4.2.7 we obtain the following upper bound ofthe remainder term of the normal
approximation:
Moreover,
L2 dN.
= 8(n - r + I)Z + 8r(n - r + 1) + 5r z < 2(n + 1)
l,r,n (0,1)
12r(n _ r + l)(n + 1)
- -=-3r-'(-'n'---r-'+-1-:-:-)'
(4.2.7)
$;
sm
C [ n(n _ s - m)
12
/
(4.2.8)
122
where C > 0 is a universal constant. Thus, if sand m are fixed then the upper
bound is of order O(n- 1). If s is fixed and (n - m)/n bounded away from 0 and
1 then the bound is of order O(n-1/2). Finally, if s is fixed and n - m = o(n)
then the bound is of order On - mt I/2 ). This shows that extremes and
intermediate order statistics are asymptotically independent.
The proof of (4.2.8) is based on Theorem 1.8.1 and Theorem 4.2.1.
Conditioning on Un - m+ 1 : n one obtains
P{ (Us: n, Un- m+1:n) E B} - P{ O-::n, v,,-m+1:n) E B} = ET(Un- m+1:n)
(4.2.9)
where
T(x) = P{xU.: n- m E Bx} - P{U.: n E Bx}
Theorem 4.2.8. Let Xi:n be the ith order statistic of n i.i.d. random variables
with common df. F. Given 1 ~ s < n - m + 1 ~ n we consider two vectors of
order statistics, namely,
Xl
sup IP{(XI,Xu )
B
4:
Xl' and
J1 /2 (4.2.10)
sm
)
nn-s-m
+ 1 ~ n we
sup IP{(Xz,Xc>XJ
k(n - r)
<C [
n(r - k)
123
sm
+ n(n - s - m)
(4.2.11)
Jl/2 .
Both theorems are deduced from (4.2.8) by means of the quantile transformation and by conditioning on order statistics.
B}I
(4.3.1)
where Ur:n is the rth order statistic of n i.i.d. (0, I)-uniformly distributed r.v.'s.
Notice that the error bound above is sharp since the second term of the
expansion of length two depends on the density f.
124
Let 0
sup IP{[f(F-1(b;))(Xri : n
F-l(b;))]~=l E
B} - P{[(Uri : n
b;)]~=l E
B}I
+ C(f)2 + n- 1/2J
maxJ=l [supYEllf(j)(y)l!infYErfi+1(y)].
At the end of this section we shall give an example showing that Theorem
4.3.1 does not hold for ri - ri - 1 = 1. It is difficult to make a conjecture whether
the result holds for ri - ri- 1 = 2 or ri - ri- 1 = 3. As we will see in the proof of
Theorem 4.3.1 one reason for the restriction ri - ri - 1 ::::-: 4 is that the supports
of the two joint distributions are unequal.
Theorem 4.3.1 is a slight improvement of Theorem 2.1 in Reiss (1981b)
which was proved under the stronger condition that r i - r i - 1 ::::-: 5. Therefore,
the proof is given in its full length. Another reason for running through all the
technical details is to facilitate and to encourage further research work.
Theorem 4.3.1 may be of interest as a challenging problem that can only be
solved when having a profound knowledge of the distributional properties of
order statistics.
Theorem 4.3.1 also serves as a powerful tool to prove various results for
order statistics. As an example we mention a result of Section 4.5 stating that
several order statistics of i.i.d. exponential r.v.'s are jointly asymptotically
normal. By making use of Theorem 4.3.1, this may easily be extended to other
r.v.'s. However, one should notice that a stronger result may be achieved by
using a method adjusted to the particular problem. Thus, applications of
Theorem 4.3.1 will lead to results of a preliminary character which may
stimulate further research work. Another application of Theorem 4.3.1 will
concern linear combinations of order statistics (see Section 6.2).
PROOF OF THEOREM 4.3.1. Part I. We write Ili = F-1(b;), /; = f(ll;) and, more
generally, /;U) = f U)(IlJ Denote by Qo and Ql the distributions of
(Uri : n
b;)~=l
s~p IQo(B) -
Ql(B)1 ::::;;
[2 Qo(A
t(
Il;))~=l'
J/
-IOg:Jd Qo
(1)
for some Borel set A to be fixed later. The main difficulty of the proof is to
obtain a sharp lower bound of JAloggl/godQo'
125
We have
and
where
+ xk/h)}.
Moreover, K is a normalizing constant, hi(x) = !(Jli + xi//;)//;, '/lAx) =
Xi - Xi-l + (hi - hi-d, <5i(x) = F(Jli + xi//;} - F(Jli-l + xi-d/;-I) - 'Mx) for
i = 1, ... , k + 1 [with the convention that Xo = Xk+l = 0, F(Jlo + xo/!o) = 0
and F(Jlk+l + Xk+l/h+1) = 1]. Thus, for A c Al we have
Al = {x: F(Jll
F(Jlk
.f f (loghJdQo
f(
A
IOggl)dQo =
go
,=1
+ k+l
.2: (ri ,=1
ri- 1
1)
f (
A
(2)
<5.) dQo
log 1 + ---.:
t/Ji
Notice that
(3)
= {x: Ixd
and
(4)
Ii~
I ~f (ri ,=1
(log hJ dQo
ri-l - 1)
I: ; C[c(f)QO(Ac)2/3 k/n
log(1
+ <5i/t/Ji) dQ o l
1/ 2
+ (c(f) + c(f)2)k/n],
(5)
126
Qo(AC) :;:;
J.
(7)
The assertion of the theorem is immediate from (1), (2), and (5)-(7).
A Taylor expansion of log(fID about f.1i yields
Iloghi(x) - (f,oW/)x;I :;:; C(c(f)
for x
+ c(f)Z)x?
(ri - ri- 1 - 1)
with
k+1
PI = i~ (ri - ri- 1 - 1)
Pz
P3 =
~1
( . _.
1)
.L... r,
r,-1
i~ (ri -
ri- 1 - 1)
,~1
log(1
f
f
t
A
+
(9)
dQo(X),
(bdt/lJ z dQo,
127
Since ri - ri- 1 ~ 4 we know that P.1.23 is applicable to I/Ii- 3 dQo and hence
the Holder inequality, Lemma 3.1.3 and Corollary 1.6.8 yield
(11)
To obtain a sharp upper bound of Ip21 one has to utilize some tedious
estimates of lc5i (x) - (aiX[ - ai-lX[-dl. A Taylor expansion of G(y) =
F(/li + yxdf;) - F(/li-l + YX i- I //;-I) about y = 0 yields
Iui~ ()
X-
(2
ai x i - ai- 1Xi-l
)1_
11 (2) ( /li + ()Xi)
X~ - (; 1
/; /;3
1(2) (
/li-l
i - 1) X~-1 1
+ ()X/;-1
/;~1
for every i = 2, ... , k and x E A 3 ,i n A 3 ,i-1 where () E (0, 1). Thus, by further
Taylor expansions of F- 1 and of derivatives of F we get
=: '1i(X),
(12)
(13)
Ip21 ~ i~
k+l
~ i~
[1'1i(x)I(1
+ (n + 1)l xi -
(f ['1i(x)(1 + (n + 1) IXi -
Xi-ll)/I/Ii(X)]dQo(x)
Xi-l 1)]2 dQo(X)
+ c(f)2)k/n.
(14)
Moreover, the arguments used to prove (11) and (14) also lead to
P3
~:~ (ri -
ri-l
-1)(f ['1i(X) +
c(f)lx[ - x[-11
bi _dx[_1]6 dQ O(X)Y/3
(15)
+ c(f)2)k/n.
(16)
128
(17)
IXi -
(18)
for i = 2, ... , k. From (10), (11), (13), (17), and (18) we infer that
(19)
~ 3En} ~ 1 -
(20)
Cc(f)4(logn)1/2/n2
(21)
This together
(22)
Counterexample
Theorem 4.3.1 was proved under the condition ri - r i - 1 ~ 4. A counterexample in Reiss (1981 b) shows that this result does not hold if ri - ri - 1 = 1
for i = 1,2,00', k.
EXAMPLE 4.3.2. Let Xi: n be the ith order statistic of n i.i.d. standard exponential
r.v.'s (with common dJ. G and density g).
Then, if n 1/2 = o(k(n)) and [nq] + k(n) :5: n where q E (0, 1) is fixed, we
obviously have
P{Ui:n
+ 1) and J1.i =
+ k(n)}
= 1
+ (bi - bi-d
[nq],oo., [nq] + k(n)} <
> 0 for i =
J1.i-1)
1.
Thus, the remainder term in Theorem 4.3.1 is not of order Ok/n)1/2) for the
sets
129
+ 1)1/2(Vr,:n -
bi),
i = 1, ... , k,
Ml -
Representations
Our first aim is to represent N(o,l:.) as a distribution induced by the kvariate standard normal distribution N(O,I) where I denotes the unit matrix.
Obviously, N(O,I) = N/'o, 1)' Given = Ao < A1 < ... < Ak < 1 define the linear
map Tby
(4.4.1)
TN(o,I) = N(o,l:.)
Lemma 4.4.1.
PROOF. Let T also denote the matrix which corresponds to the linear map.
The standard formula for normal distributions yields that T~O,I) has the
covariance matrix H = ('1i) = TTl where yt is the transposed of T. Thus,
~
Am - Am-1
'1i,j = (1 - Ai)(1 .- Aj) m~l (1 - Am- 1 )(1 - Am)
for i ~j.
Am - Am-l
m=l (1 - Am-d(l - Am)
~ j.
Aj
(1 - A)
130
From standard calculus for normal distributions we know that the density
of N(o,l:) is given by
((J(O,l:)
({J(O,l:)(x)
= [det1:-1/(2n)kr/2exp[-ht1:-1x]
(4.4.2)
where x = (x 1 , ... ,xS and 1:-1 is the inverse matrix of 1:. By elementary
calculations and by formula (4.4.4) below we get an alternative representation
of ({J(O,l:)' namely,
/
({J(O,l:)(x)
k+!
J-1 2
= [ (2n)k 1] (Ai - Ai-d
exp
and lXi.i-1
(lX i )
)2J
;:_11
(4.4.3)
is given by
i 1
= -,--:-------,---,--,--:-------:------:-
1,1
k+1 (x. - x
i~ ~i _
= Xk+1 = 0.
Xo
[1-2
i
= 1, ... , k,
n (Ai -
k+1
i=l
Ai_d- 1.
(4.4.4)
1 - Ai - 1
J1 /2
[
f3i,i = (1 - Ai)(Ai - Ai-d
'
= 1, ... , k,
and
f3i,i-1 =
-[(1 _Ai~l~A~i- )J
Ai_1
/2
and f3i,i = 0, otherwise. Notice that 1:- = BtB = n=~=l f3m,if3m,i]i,i and, thus,
lXi,i = f3ti + f3[+l,i, lXi, i-I = lXi-1,i = f3i,if3i,i-1 and lXi,i = 0, otherwise. The proof
of (i) is complete.
(ii) Moreover,
k 2
_12
k
-1 k 1 - Ai- 1
det 1: = (det B) =
f3i,i =
(Ai - Ai-d
i=l
i=l
i=l 1 - Ai
k+1
=
(Ai - Ai_1f 1.
o
i=l
Moments
Recall that the absolute moments of the standard normal distribution
are given by
N(O,l)
I.
Xl
I I
X =
(O,1)()
1 . 3 . 5 ..... (j - 1)
(2j/n)1/2((j _ 1)/2)!
131
'f j even
j odd
(4.4.5)
Ixl
Xi- 1 dN(o,I/X) =
if j even (4.4.6)
j odd.
= 2, ... ,
xixl- 1 dN(O,I)(X) =
k - 1,
o.
(4.4.7)
(4.5.1)
132
I II denotes
Theorem 4.5.1. For all positive integers k and ri with 0 = ro < r1 < r2 <
... < rk < rk+1 = n + 1 the following inequality holds:
(4.5.2)
i=l
+ 1) =
Pn ~ 2k2/n
which shows that N(o."F.) will provide an accurate approximation to Pn only
if the number of order statistics under consideration is bounded away from
n 1/2 From the expansion of length 2 we shall learn that the bound in (4.5.2)
is sharp.
Next we make some comments about the proof of Theorem 4.5.1. Notice
that the asymptotic normality of several order statistics holds if the corresponding spacings have this property. Let Qn denote the joint distribution of
the normalized spacings
en
l)(~i-=-b~~~:(l -
Xt=l
133
IIQ. - N(o.l)II:::;;
Ct
(4.5.6)
/
H(Q"i,N(o.l)fY 2
where H denotes the Hellinger distance. The first inequality and upper bounds
of Wn.i - N(o.l)ll, i = 1, ... , k (compare with Corollary 4.2.7) lead to an
inaccurate upper bound of IIQn - N(o.l)II. The second inequality is not
applicable since a bound of the Hellinger distance between Qn. i and N(O.I) is
not at our disposal. The way out ofthis dilemma will be the use of an expansion
of length two.
s~p IPn(B) -
+ Lr.n)dN(o.l:)I:::;; Cexp(CPn)Pn
(1
(4.5.7)
sup
Qn,'.(B) B
:::;;
C2
Jr (1 + L
B
n - ri-
(ri - ri-d(n - ri + 1)
) dN.(0,1)
I
(1)
=:
>:
C2 Ui'
The bound for the variational distance between product measures via the
variational distance between the single components (compare with Corollary
134
A.3.4)) yields
sup
B
I(x
,=1
S;;
Qn.i)(B) -
C z exp [2C z
f TI
B,=l
(1
it it
bi ]
+ L1.ri-ri~l.n-ri-l(XJ)dN(~.l)(X)1
(2)
bi
Next we verify that the integral in (2) can be replaced by that in (4.5.7).
Lemma A.3.6, applied to gi = L1.ri-ri~l.n-ri~l' yields
sup
-L[1 it L1.ri-ri~1.n-ri~1(Xi)]dN(~.1)(X)1
+
S;;
S;;
I8- 1/Z ex p
(4.5.8)
,=1
[r1
,=1
it bi] it bi
,=1
S;;
r [1 +.f,=1 L1.ri-ri~1.n-ri~1(XJ]dNto.1)(X)1
JB
C z exp[2CZPn]Pn
S;;
(4.5.9)
Cexp(CPn)Pn-
Now, the transformation, as explained in (4.5.5), yields the desired inequality (4.5.7). For this purpose apply the transformation theorem for
densities. Note that the inverse S of T is given by
D
From (4.5.9) we also deduce for the normalized, joint distribution Pn of
order statistics that
S;;
1 p~/Z
+ O(Pn)
135
Ir
(4.5.8')
= 1, ... , k,
(4.5.11)
where Xi," is the ith order statistics of n i.i.d. random variables with common
dJ. F and density f, and bi = rj(n + 1). Recall that the covariance matrix L is
defined by (Ji,j = b;(1 - bJ for 1 ::;; i ::;; j ::;; k.
From Theorem 4.3.1 and 4.5.1 it is easily seen that under certain regularity
conditions,
(4.5.12)
with P. as in (4.5.3). The crucial point is that the underlying density is assumed
to possess three bounded derivatives. The aim of the following considerations
is to show that (4.5.12) holds if f has two bounded derivatives. The bound
O(p;/2) is sharp as far as the normal approximation is concerned, however,
p;/2 is of a larger order than the upper bound in Theorem 4.3.1.
Theorem 4.5.3. Denote by p. the joint distribution of the normalized order
statistics in (4.5.11). Assume that the underlying density f has two derivatives on the intervals Ii = (F- 1(b;) - 8 i, F- 1(b;) - 8 i), i = 1, ... , k, where 8 i =
5[(Ji,ilog(n)/(n + 1)] 1/2/f(F- 1(b;)). Moreover, assume that min(b 1, 1 - bk ) ~
10 log(n)/(n + 1).
Then there is a universal constant C > such that
136
Q~ = TQn
where T(x)
= (Tl (X 1 ), . .. ,
T;(x;) = (n
1k(xd) and
1)1/2 ( C ( C- 1(b;)
+ (n + ~i)1/2g) -
+ 1)1/2g;) > 0, i
~ Cp~/2
(2)
bi)
1, ... , k.
IITNro.I:) - N(o.I:)II
II TN(o.I:) - N(o.d
where, throughout, C denotes a universal constant that will not be the same
at each appearance. Thus, it remains to prove that
(3)
for x with
where
i = 1, ... , k,
(4)
(5)
We prefer to prove (5) instead of (3) since this is the inequality that also has
to be verified in the second part of the proof with C replaced by F.
Denote by NT and Ns the restrictions of N(o.I:) to the domains DT of T and
Ds of S. Check that
IITN(o.I:) - N(o.dl ~ II(To S 0 T)N(o.I:) - (To S)N(o.dl
+ IINs -
+ N(o.I:)(D~) + N(o.I:)(Ds)
N(o.dl
~ CPn + [2N(o.I:)(B") +
(-IOg(fdlo))dN(o.I:)J2
(7)
for sets B in the domain of T, and 10' 11 being the densities of N(o.I:) and
SNro.I:)' Applying the transformation theorem for densities (1.4.4) we
obtain
X E
137
B, where
+ 1)1/2(1
Xo
= 0 and bk+1 = 1,
(9)
- bi)
+ 1)1/20"i,i'
(10)
Define
B = {x: Xi > -(10(logn)0";,;)1/2, i = 1, ... , k}.
~(x) ~
qJ(x)/x we obtain
~o,};)(BC) ~ n- 4.
(11)
+ 1)
i = 1, ... , k,
Xi dN(o,};)(X) = 0,
(12)
1JC
b;) dN(o,};)(X)
Cn- 1
(13)
i~
(xt - Xt-1)(Xi -
xi-dd~o,};)(x) = 0,
(14)
and hence, applying (4.4.5) and (4.4.6), we obtain by means of some straightforward calculations that
i(
B
1=1
i -
i-1
() < C
(o,};) X
Pn
Combining (11), (13), and (15) we see that the assertion of Part I holds.
(15)
138
II. Notice that Pn = SQ: where S is defined as in (4) with G and gi replaced
by F and f(F-l(bJ). Using Taylor expansions oflog I;'(xJ and I;(xJ the proof
of this part runs along the lines of Part I.
0
Final Remarks
In Reiss (1981a) one can also find expansions of length m > 2 for the joint
distribution of central order statistics of exponential r.v.'s. Starting with this
special case, one may derive expansions in case of r.v.'s with sufficiently
smooth dJ. by using the method as adopted in Reiss (1975a); that is, one has
to expand the densities and to integrate the densities over Borel sets in a more
direct way.
+ 1)3
and
br,n = r/(n
+ 1).
Continuous D.F.'s
First, the results of Section 4.2 will be rewritten in terms of d.f.'s.
Corollary 4.6.1. Under the conditions of Theorem 4.2.4 there exist polynomials
Si,r,n of degree ~ 3i - 1 such that
~ em [(n/r(n -
p-l
r))m/2
(br,n))
~ t} -
( <l>(t)
+ <pet) ~~l
Si,r,n(t))
(4.6.1)
139
(qJSi,r,n)'
= qJLi,r,n
n-2r+1
2
2
Sl,r,n(t) = 3 [r(n _ r + 1)(n + 1)] 1/2 (1 - t ) + (Xl,r,n t /2
(4.6.2)
and
_
(
1)(
.. n(t) - rn-r+
n+ 1) [-(n - 2r
S2 r
+ [7(n - 2r
t
(Xj,r,n
+ 1) (1St + St + t )/18
t5
(X2l,r,n -8
t3
+ (X 2,r,n -6
(4.6.3)
<em ( r(n - r)
)m12
(Xi,r,n
= O.
Discrete D.F.'s
The conditions of Theorem 4.2.4 exclude discrete d.f.'s F. The key idea of the
following is to approximate the d.f. F (which may be discrete) by some function
G which fulfills an appropriate Taylor expansion.
As an example we shall treat the case of d.f.'s F that permit an Edgeworth
expansion (like binomial d.f.'s).
We start with a technical lemma.
Lemma 4.6.3. Let Xi:n be the order statistics of n i.i.d. random variables with
common dl. F.
Let G be a function and u a fixed real number such that for all reals y,
IG(u +
y) - G(u) -
.f ~: yi I::; (m m++11)IIYlm+1.
.
,=1 L
(4.6.5)
140
Then, if C 1 > 0 there exists a universal constant Cm > 0 and polynomials Si.r,n
of degree ~ 3i - 1 such that for all reals t the following inequality holds:
PROOF.
u)
r (n
~ t} -
n
- r
+ 1)
( <I>(t)
+ cp(t) :~1
)m/2 + am
Si,r,n(t)) I
(c. /c j +1 )mfj
r,n max
j=l )+1 1
(4.6.6)
P{a;::~c1(Xr:n - U) ~ t}
+ V(t) + a;::~(G(u) -
br,n),
Ip{a;::~C1(Xr:n -
~C
m[
u)
~ t} -
C(n _: +
[ <I> (V(t))
l)r
/2
+ a;::!(IF(x) -
G(x)1
IG(u) - br,nD].
+L
m
i=2
a i- 1
~ r'in t i + em(t)
dC 1
(m
m+1
am
r':+l Itl m+1
+ 1)!c 1
141
(4.6.8)
Since GM,N is an approximation to FN we know that GZt,N is an approximation to Fli 1. As an application of Lemma 4.6.3 to F == FN, G == GM,N' and
u = GZt,N(br,n) we obtain the following
Corollary 4.6.4. Under condition (4.6.7) there exists em, M > 0 such that for every
positive integer n, r E {1, ... , n} and tEl:
IP{X"n :s; t} - ( <I>
(4.6.9)
where
SM(t) = a;'~ G~,N[GZt,N(br,n)] (t - GZt,N(br,n))
and the Si,r,n are the polynomials of Lemma 4.6.3 with Ci = GX},N(GZt,N(br,n))'
PROOF. To make Lemma 4.6.3 applicable one has to verify that
GM,N(GZt,N(br,n)) = br,n
+ O(N-m/2).
(1)
It suffices to prove that (1) holds uniformly over all rand n such that
1<1>-1 (br,n)1 = O(log N). A standard technique [see Pfanzagl (1973c), page 1016]
yields
(2)
142
where
FN(t)
pt- k
p =.2
N=n
r = [nI4]
p =.2
N = [n4/ 3 ]
r = [nI4]
p =.5
N = [n4/ 3 ]
r = [nI2]
(m,M)
n
20
80
200
(1, 1)
(2,2)
(1, 1)
(2,2)
(1, 1)
(2,2)
(1, 1)
(2,2)
.33
.38
.42
.01
.002
.0001
.35
.32
.31
.01
.003
.0001
.29
.22
.20
.007
.002
.0007
.27
.20
.16
.006
.0028
.0005
IP {X". ~ k} -
<I> [
<1>-1 (b".) ) ]
over k = 0, ... , N.
uniformly over the Borel sets. The main technical tool was an expansion of
one factor of the density (compare with the proof to (4.7.2)). The expansion
of the density was not given explicitly to concentrate our attention on the
result of statistical relevance, namely, the expansion of distributions.
The final section of this chapter is the proper place to give some explicit
formulas for expansions of densities with an error bound that is nonuniform
in x. By integration we shall also get inequalities which are relevant for
probabilities of moderate deviation.
143
Let again
a;.n = r(n - r
+ 1)/(n + 1)3
br,n = r/(n
and
+ 1).
Denote again by U"n the rth order statistic of n i.i.d. (0, I)-uniformly
distributed r.v.'s. From Lemma 3.1.1 we obtain
P{a;::~IU"n -
+n
e
: e/(ar,nn)]).
e > O.
(4.7.1)
A refinement of this result will be obtained in the second part of this section.
and by <I> and q; the standard normal dJ. and density. The most simple "local
limit theorem" is given by the inequality
Igr,n(x) -
(4.7.2)
(4.7.3)
gr,n(x)dx
IL
hr,n(x)dx
+ O((n/r(n -
+ O((n/r(n -
r))1/2]
(4.7.5)
r))1/2)
where the final step is immediate by specifying B = {x: Ixl :::;; log(r(n - r)/n)}
and applying (4.7.1) to e = log(r(n - r}/n).
An expansion of length m can be established in the same way. For some
constant Cm > 0 we get
144
Igr.n(X) -
cp(x) ( 1 + i~ Li.r.n(x)
m-1
I ~ Cmcp(x) (n
)m/2
r(n _ r + 1)
(1 + Ixl 3m )
(4.7.6)
a;:~(F[F-l(br,n)
+ xar,n/f(F-1(br,n))]
- br,n)
has m + 1 derivatives on the interval Ir,n:= {x: Ixl ~ cr,n} where log(r(n - r)/
n) ~ cr,n ~ (r(n - r)/n) 1/6/2. Denote by fr,n the density of
a;:~f(rl(br.n))(X"n - r1(br,n))'
(1 + ~~1
Li,r,n)
~ Cmcp(x)(1 + IXI3m{(n/r(n -
r))m/2
+ rr;~lx
(4.7.7)
laj,r,nl mli ]
(1 + }:::
Li,r,n(Sr,n)) I
Moderate Deviations
We shall only study a simple application of (4.7.1). It will be shown that
the right-hand side of (4.7.1) can be replaced by a term Cexp(-e 2 /2)/e for
certain e.
145
(i)
~ C(,(n _ :
l)y/2
+ IxI 3)q>(x)dx
(1
if 8
(r(n - r
+ 1)/n)I/6/2.
8}
= P{a;:~IUr:n - br.nl
~ d}
< 2exp ( -
3[1
+ P{8
~ a;:~IUr:n - br.nl ~ d}
+ n- 1 + d/(ar.nn)]
f)
IXI q>(X)dX)
~ Cexp( -8 2 /2)/8
I . (X)
2..
{O'f
cI>(x')
1
0,
x<
X 2 0,
if x < 0,
x 2 0,
+ H 2. (x)1(-00.o)(x),
+ 1[1.00/2
146
n --> 00 having the following property: Let Xr(n),n denote the r(n)th order
statistic of n i.i.d. random variables with common dJ. F. Then, the dJ. of
a,;-I(Xr(n),n - bn) converges weakly to H for certain an> 0 and bn.
(Balkema and de Haan, 1978b)
(iii) The set of all drs F such that (ii) holds is dense in the set of drs w.r.t. the
topology of weak convergence.
(Balkema and de Haan, 1978b)
(iv) Let XI' X 2 , X 3 , . be a stationary, standard normal sequence with covariances
,(n) = EXIXn+ 1 satisfying the condition L~II,(n)1 < 00. Let r(n) E {l, ... ,n}
be such that r(n)/n --> .l., n --> 00, where 0 < .l. < 1. Denote by Xr(n),n the r(n)th
order statistic of X I, ... , X n . Then, for every x,
n -->
00,
where
(Rootzen, 1985)
2. Let again N(p.r.) be a k-variate normal distribution with mean vector J1 and
covariance matrix ~ = (0";). Moreover, let I denote the unit matrix.
(i) Prove that
IIN(o.r.) - N(o. l)II :s;
l/2
Lt
(0";.; - 1)
-IOg(det(~J/2.
(iii) Alternatively,
IIN(o.r.) - N(o.l)11 :s; k2k+111~ - 1112'
where 11112 denotes the Euclidean norm.
(Pfanzagl, 1973b, Lemma 12)
(iv) Denote again by K the Kullback-Leibler distance. Prove that
K(N(P.l),N(o.l) = 2-111J111~.
3. Let N(o.r.) be the k-variate normal distribution given in Lemma 4.4.1. Define the
linear map S by
147
(Reiss, 1975a)
4. (Spacings)
Given 1 ::; r l < ... < rk ::; n put again Ai = rj(n + 1), (Ji.j
j::; k, and /; = F'(F-I(AJ). Moreover, we introduce
af =
(Ji-I.i-d/;:I - 2(Ji-Ij(/;_I/;)
= Ai(1 -
+ (JijP
1, ... , k,
~ =
(Ji) and
1 - (1 - Ak)1/2
(Ai - Ai_dI/2/(ai/;)'
i=l
(ii) L\ = 0 if k = 1.
(iii) If F is the uniform dJ. on (0, 1) then
(1')
s~p P
{n I/2f(p-I(q
(J
(Xr(n),n -
-I} r
(q
E B
JB
where
Gr(n),n -'"
- q>
+ cP
m-I
,,-i/2S
n
i,n'
L..
i=l
dGr(n),n
1_
-m12 )
- O(n
148
2q - 1
Sl.n(t) = [ ~
uj'(r1(qJ 2
l(q2 t
+ 2f(F
[-q
+ nq u
r(n)
+1
2(2q - I)J
3u
.
+ Ixl 3m
[-logn, logn].
6. (Asymptotic independence)
Given n i.n.n.i.d. random variables with dJ.'s F1 ,
Fn we have
La
(Fj(y)(1 - F;(X)))J -
[fl
(Fj(y) - F;(X].
(Walsh, 1969)
Bibliographical Notes
Laplace (1818) derived the asymptotic normality of sample medians. He
computed the density of the sample median (within a more general framework)
and proved a limit theorem for the pointwise convergence of the densities. For
a discussion of this result and applications we refer to Stigler (1973). This
method was also used by Smirnov (1935) to obtain the asymptotic normality
of central order statistics in greater generality. Other approaches reduce the
problem to an application of the central limit theorem (that includes as a
special case the asymptotic normality of binomial r.v.'s). The reduction is
achieved either by means of the representations given in Section 1.6 (Cramer,
1946, and Renyi, 1953), the equality in (1.1.8) (Smirnov, 1949, van der Vaart,
1961, and Iglehart, 1976), or the Bahadur approximation (Sen, 1968).
The problem of charaterizing the possible limiting d.f.'s of central order
statistics was dealt with by Smirnov (1949) (see P.4.1(i)) and Balkema and de
Haan (1978a, b). If no regularity conditions are supposed, every dJ. is a
limiting dJ. of central order statistics (see P.4.1(ii)).
An interesting problem, not treated in the book, occurs if the value of the
underlying density at the q-quantile is equal to zero or if the q-quantile is not
unique; in this context we refer to the articles of Feldman and Tucker (1966),
Kiefer (1969b), Umbach (1981), and Landers and Rogge (1985) for important
contributions.
Bibliographical Notes
149
A bound for the accuracy of the normal approximation to the dJ. of a single
order statistic was established by Reiss (1974a) (where the terms of the error
bound are given explicitly), Egorov and Nevzorov (1976), and Englund (1980).
Expansions of distributions of sample quantiles were established in Reiss
(1976). There it was merely assumed that the underlying dJ. F has derivatives
on (F-l(q) - s, F-l(q)] and (F-l(q) F-l(q) + s) for some s > O. Ifthe left and
right derivative of F at F-l(q) are unequal, then the leading term of the
expansion is a certain mixture of normal distributions (compare this with
P.4.1(i)). In this context, we also refer to Weiss (1969c) who proved a limit
theorem under such conditions.
Puri and Ralescu (1986) studied order statistics of a non-random sample
size n and a random index which converges to q E (0, 1) in probability. Among
others, the asymptotic normality and a Berry-Esseen type theorem is proved.
A result concerning sample quantiles with random sample sizes related to that
for maxima (see P.5.11(i)) does not seem to exist in literature.
The problem of asymptotic independence between different groups of order
statistics provides an excellent example where a joint treatment of extreme
and central order statistics is preferable. The asymptotic independence of
lower and upper extremes was first observed by Gumbel (1946). A precise
characterization of the conditions that guarantee the asymptotic independence is due to Rossberg (1965, 1967). The corresponding result in the strong
sense (that is, approximation w.r.t. the variational distance) was proved by
Ikeda (1963) and Ikeda and Matsunawa (1970). In the i.n.nj.d. case, Walsh
(1969) proved the asymptotic independence of sample minimum and sample
maximum under the condition that one or several dJ.'s do not dominate the
other dJ.'s.
First investigations concerning the accuracy of the asymptotic results were
made by Walsh (1970). Sharp bounds of the variational distance in case of
extremes were established by Falk and Kohne (1986). Tiago de Oliveira (1961),
Rosengard (1962), Rossberg (1965), and Ikeda and Matsunawa (1970) proved
independence results that include central order statistics and sample means.
The sharp inequalities in Section 4.2 concerning extreme and central order
statistics are taken from Falk and Reiss (1988).
The asymptotic independence of ratios of consecutive order statistics was
proved by Lamperti (1964) and Dwass (1966); a corresponding result holds
for spacings. Smid and Stam (1975) showed that the condition, sufficient for
this result, is also necessary.
In Lemma 4.4.3 an upper bound of the distance between the normal
distribution N(O,I) and a distribution induced by N(O,I) and a function close to
the identity is computed. For related results we refer to Pfanzagl [1973a,
Lemma 1] and Bhattacharya and Gosh [1978, Theorem 1]. These results are
formulated in terms of sequences of arbitrary normal distributions of a
fixed dimension and therefore not applicable for our purposes. The normal
comparison lemma (see e.g. Leadbetter et al. (1983), Theorem 4.2.1) is related
to this.
150
For rei) = rei, n), i = 1, ... , k, satisfying the condition rei, n) ~ qi' n ~ 00,
where < q 1 < ... < qk < 1, the weak convergence of the standardized joint
distributions of order statistics Xr(i),n to the normal distribution N(o.r.) was
proved by Smirnov (193S, 1944), Kendall (1940), and Mosteller (1946).
The normal distributions N(o,r.) are the finite dimensional marginals of
the "Brownian Bridge" WO which is a special Gaussian process with mean
function zero and covariance function E WO(q) WO(p) = q(1 - p) for Os q s
p s 1. The sample quantile process
[0, 1],
here given for (0, I)-uniformly distributed r.v.'s, converges to WO in distribution. Thus, the result for order statistics describes the weak convergence of the
finite dimensional marginals of the quantile process. For a short discussion
of this subject we refer to Serfling (1980). In view of the technique which is
needed to rigorously investigate the weak convergence of the quantile process,
a detailed study has to be done in conjunction with empirical processes in
general (see e.g. M. Csorgo and P. Revesz (1981) and G.R. Shorack and
J.A. Wellner (1986)). The invariance principle for the sample quantile process
provides a powerful tool to establish limit theorems (in the weak sense) for
functionals of the sample quantile process, however, one cannot indicate the
rate at which the limit theorems are valid. For statistical applications of the
quantile process we refer to M. Csorgo (1983) and Shorack and Wellner (1986).
Weiss (1969b) studied the normal approximation of joint distributions of
central order statistics w.r.t. the variational distance under the condition that
k = ken) is of order O(n 1 /4 ). Ikeda and Matsunawa (1972) and Weiss (1973)
obtained corresponding results under the weaker condition that ken) is of
order O(n 1/3). Reiss (197 Sa) established the asymptotic normality with a bound
of order O(~}~1 (ri - ri_l)-1 )1/2 for the remainder term. We also refer to Reiss
(197Sa) for an expansion of the joint distribution of central order statistics (see
Section 4.S for an expansion of length two in the special case of exponential
r.v.'s). Other notable articles pertaining to this are those of Matsunawa (197S),
Weiss (1979a), and Ikeda and Nonaka (1983).
An approximation to the multinomial distribution, with an increasing
number of cells as the sample size tends to infinity, by means of the distribution
of certain rounded-off normal r.v.'s may be found in Weiss (1976); this method
seems to be superior to a more direct approximation by means of a normal
distribution as pointed out by Weiss (1978).
The expansions of dJ.'s of order statistics in Section 4.6, taken from Nowak
and Reiss (1983), are refinements of those given by Ivchenko (1971, 1974).
Ivchenko also considers the multivariate case. In conjunction with this, we
mention the article of Kolchin (1980), who established corresponding results
for extremes.
CHAPTER 5
Approximations to Distributions
of Extremes
The non degenerate limiting dJ.'s of sample maxima Xn:n are the Frechet d.f.'s
G1 ,a, Wei bull d.f.'s G2 ,a, and the Gumbel dJ. G3 Thus, with regard to the
variety of limiting d.f.'s the situation of the present chapter turns out to be
more complex than that of the preceding chapter, where weak regularity
conditions guarantee the asymptotic normality of the order statistics.
As stated in (1.3.11) the limiting dJ.'s are max-stable, that is, for G E
{G 1 ,a, G2 ,a, G3 : IX > O} we find Cn > 0 and reals dn such that
Gn(dn + xc n) = G(x).
wn(dn + xc n) = G(x)
+ O(n- 1 )
where Cn and dn are the constants for which Gn(dn + xc n) =
152
between the exact and limiting distributions will be measured W.r.t. the
Hellinger distance.
In Section 5.3 some preparations are made for the study of the joint
distribution of the k largest order statistics; it is shown that there is a close
connection between the limiting distributions of the kth largest order statistic
Xn-k+Ln and the k largest order statistics
Higher order approximations in case of extremes of generalized Pareto
r.v.'s are studied in Section 5.4. The accuracy of the approximations to the
distribution of the kth largest order statistics and the joint distribution of
extreme order statistics is dealt with in Section 5.5.
Finally, in Section 5.6, we shall make some remarks about the connection
between extreme order statistics, empirical point processes, and certain
Poisson processes.
n --+
(5.1.1)
00,
for every continuity point of the nondegenerate limiting dJ. G then G has to
be of the type G1,a, G2 ,a, G3 for some IX > O.
Recall that G1,a(x) = exp( _x-a) for x> 0, G2 ,a(x) = exp( -( -x)") for
x < 0, and G3 (x) = exp( _e- X ) for every x.
= IXx-(1+a)exp( _x-a),
g2,a(X)
g3(X)
e- X exp( _e- X ).
0 < x,
x < 0,
153
gl,"
with parameters
IX
Frechet Densities
Figure 5.1.1 is misleading so far as one density seems to have a pole at zero.
A closer look shows that this is not the case. Moreover, from the definition
of gl," it is evident that every Frechet density is infinitely often differentiable.
For a = 5 the density already looks like a Gumbel density (compare with
Figure 1.3.1).
The density gl,. is unimodal with mode
m(l, a) = (a/(l
+ IX
I!".
0,
and
m(l, a)
1,
gl,.(m(l, a)) ~
00,
as a ~
00.
Weibull Densities
The "negative" standard exponential density g2,1 possesses a central position
within the family of Weibull densities. The Weibull densities are again
unimodal. From the visual as well as statistical point of view the most
significant characteristic of a Weibull density g2,. is its behavior at zero (Figure
5.1.2). Notice that
xi o.
g2,.(X) '" a( - xrl,
One may distinguish between five different classes of Weibull densities as
far as the behavior at zero is concerned:
154
-2
-1
= 1: jump
Moreover,
m(2, rx)
--+
0,
1,
as rx
--+
--+ 00,
as rx
--+ 00.
g2.a(m(2, rx
--+
1,
and
m(2, rx)
--+
1,
g2.a(m(2, rx
Gumbel Density
The Gumbel density g3(X) = e-Xexp( _e- X ) approximately behaves like the
standard exponential density e- X as x --+ 00. The mode of g3 is equal to zero.
For the graph of g3 we refer to Figure 1.3.1.
155
As mentioned above, c;;-l(Xn:n - dn) has the dJ. Gi,a. if F = Gi,a. and if the
constants are appropriately chosen. Thus e.g. the sample maximum Xn:n of
the negative exponential dJ. GZ,l may serve as a starting point for the study
of asymptotic distributions of sample maxima. However, to extend such a
result one has to use the transformation technique (or some equivalent more
direct method) so that it can be preferable to work with the sample maximum
Un : n or v,,:n of n i.i.d. random variables uniformly distributed on (0,1) or,
respectively, ( -1, 0). In this case the limiting dJ. will again be G2 ,l' Recall that
the uniform distribution on ( - 1,0) is the generalized Pareto distribution W2 ,l'
As pointed out in (1.3.14) we have
G2,l(X),
n -+
00,
(5.1.2)
n -+
00,
(5.1.3)
P{n(Un:n - 1) ~ x} = P{nv,,:n ~ x}
-+
for every x.
(5.1.2) and Corollary 1.2.7 imply that
on
(a(G),w(G)).
This yields
n -+
00,
for every x,
(5.1.4)
n -+
00,
w(F)
00,
= x-a.,
x > 0;
(5.1.5)
t .... ""
(2, a):
w(F) <
00,
lim [1 - F(w(F)
t-l-o
= ( - x)a.,
(3):
lim [1 - F(t
+ xg(t))]/[l
+ xt)]/[l
- F(w(F) - t)J
x < 0;
- F(t)] = e-X,
(5.1.6)
-00
<x<
00,
(5.1.7)
ttw(F)
156
(2, IX):
b: = 0,
b: = w(F),
(3):
b: = F- I (1 - l/n),
(1, IX):
a: = rl(1 - I/n);
(5.1.8)
a: = w(F) - F- I (1 - I/n);
(5.1.9)
a: = g(b:)
(5.1.10)
--+
1 and
--+
0 as n --+
00.
(5.1.11)
+ xa n ))/[l
- qt(x)]
--+
1,
n --+
00,
(5.1.12)
n --+
00,
for every x,
(5.1.13)
n --+
n --+
00,
00,
(5.1.14)
(5.1.15)
157
n -+
00,
(5.1.16)
n -+
00.
(5.1.17)
PROOF. From Theorem 1.3.2 we deduce that n(Un : n - 1) has the density
In given by fn(x) = (1 + x/nrl, -n < x < 0, and =0, otherwise. Thus,
fn(x) -+ eX = g2, 1 (x), n -+ 00, x < 0, and hence the Scheffe lemma implies the
assertion.
0
Next, we study conditions under which F belongs to the strong domain of
attraction of an extreme value dJ.
w=g/G
on appropriate intervals where G is the corresponding extreme value dJ. and
9 = G'. Explicitly, we have
wl,a(x)
= {~x-(l+a)
if
w".(x)
~ {~( - xY-'
if
= {~-X
if
W3(X)
x<l
x~l
"Pareto"
(5.1.18)
"Type II"
(5.1.19)
x < -1
-l~x~O
x>O
x<O
x ~ O.
"Exponential" (5.1.20)
158
The generalized Pareto densities as well as the extreme value densities are
unimodal. The particular feature of the generalized Pareto densities is the tail
equivalence to the corresponding extreme value densities at the right endpoint of the support.
The counterpart to Theorem 5.1.I-with respect to the strong convergence-is the following.
Lemma 5.1.3. Assume that the constants an > 0 and bn are chosen so that the
weak convergence holds, that is, Fn(bn + xa n) -4 G(x), n -4 00, for every x, where
G E {Gl,a, G2 ,a, G3 : a < o}. Then,
(5.1.21)
n -400,
if, and only if, for every subsequence i(n) there exists a subsequence ken)
such that
ak(n)f(bk(n) + xak(n)
ck(n) w(dk(n) + xCk(n)
-----'--'-----'-"---'--'--4,
n -4
i(j(n))
(5.1.22)
00,
for Lebesgue almost all x E (a(G), w(G)) where w is the corresponding generalized Pareto density and Cn and dn are the constants of Theorem 5.1.1.
n -4
00,
(5.1.22')
for almost all x E (a (G), w(G)). The equivalence of (5.1.22) and (5.1.22')
becomes obvious by noting that dn + xC n E (a(W), w(W)) for every x E (ct(G),
w(G)), and ljJ(x}/n = Cnw(d n + xc n), eventually.
Without the condition Fn(bn + xa n) -4 G(x), n -4 00, (5.1.22) does not necessarily imply (5.1.21) as can be shown by examples. If the weak convergence
holds then a sufficient condition for the convergence W.r.t. the variational
distance is
a.f(bn + xa n)
cnw(d n + xc n)
---'-------41,
n-4OO,
xE(a(G),w(G)).
(5.1.23)
Note that the rate of convergence in (5.1.23) will also determine the rate at
which the strong convergence of the distributions holds. We remark that the
generalized Pareto density w can be replaced by the density g of G in condition
(5.1.23).
Notice that (5.1.23) is equivalent to
n -4
PROOF OF
LEMMA
00,
X E
(a(G), w(G)).
5.1.3. Since
x
-4
na.f(bn + xa n)Fn-l(bn + xa n)
(5.1.23')
159
is the density of a;;-l(Xn:n - bn) it is immediate from the SchetTe lemma 3.3.4
that (5.1.21) is equivalent to (5.1.22').
D
Lemma 5.1.3 will be the decisive tool to prove the following equivalence:
F belongs to the strong domain of attraction of an extreme value distribution
if, and only if, the corresponding result holds for the joint distribution of
the k largest extremes for every positive integer k. For details we refer to
Section 5.3.
From the mathematical point of view, condition (5.1.22) is more satisfactory
than the sufficient condition (5.1.23). However, for practical purposes condition (5.1.23) can be useful; e.g. to verify that a given dJ. belongs to the strong
domain of attraction of a particular extreme value distribution G E {G1, .. , G2 , .. ,
G3 : ex > O}. It was proved by Falk (1985a) that the von Mises conditions
(5.1.24) imply (5.1.23) and that (5.1.23) implies the convergence in the strong
sense. Sweeting (1985) was able to show that the von Mises conditions (5.1.24)
are equivalent to the uniform convergence of the densities in (5.1.23') on finite
int:!rvals if the density f is positive on a left neighborhood of w(F).
(1, ex):
w(F) =
00,
(2, ex):
w(F) <
00,
= ex;
ttw(F)
(3):
(5.1.24)
W(F)
-00
(1 - F(u)) du <
00,
and
W(F)
lim f(t)
ttw(F)
(1 - F(u))du/[1 - F(t)]2
= 1.
1
lim [(1 - F)/f]'(t) =
rtw(F)
-!ex
if i = 2
i = 3.
(5.1.25)
160
an
= 1/(nf(b:))
(5.1.26)
(5.1.27)
= (1 - <1> (x)) x _ 1.
<p(x)
It is immediate from (3.2.3) that this expression tends to zero as x -+ 00. Thus,
condition (5.1.25), i = 3, implies that <1> belongs to the domain of attraction
of the Gumbel dJ. G3 Hence, according to (5.1.26), with bn = <1>-1(1 - lin),
n -+
00.
(1)
= lin.
(2)
(1 - <I>(b))/cp(b)
161
= 1/(ncp(b))
ncp(b) = b,
(4)
(5)
(lOglOg n)2 )
1
.
ogn
(6)
We remark that the bound in (6) is sharp. Moreover, the same rates are
obtained if d.f.'s are considered.
-+
Gi.lZ(x),
n -+
00,
-+
n -+
Gi.lZ.k(X),
00,
(5.1.28)
k-1 x- jlZ
L -.-, '
j=O
= exp( -( _X)IZ)
J.
k-1 (_XylZ
L -.-,-,
j=O
J.
x> 0,
x < 0,
-00
<x<
(5.1.29)
00.
162
= Gi,a
k-1
j:O
(5.1.30)
(-log Gi,ay/j!
with
Un
un}
-+ 00,
00,
p{~ l(Un,oo)(~J::;; k -
1}
(5.1.31)
where PI denotes the Poisson distribution with parameter t > 0. Thus, (5.1.28)
holds.
Moreover, it is well known that every nondegenerate limiting dJ. of the
kth largest order statistic X n -k+1:n has to be one of the d.f.'s in (5.1.29)
(see e.g. Galambos (1987), Theorem 2.8.1) where it is always understood that
we have to include a location and scale parameter if the dJ. of X n - k + 1 : n is not
properly standardized.
Note that in analogy to (1.3.15) the nondegenerate limiting d.f.'s Fi,a,k of
the kth smallest order statistics Xk:n are given by
F1,a,k(X)
= 1 - Gl,a,k( -x),
x < 0,
F2,a,k(X)
= 1 - G2,a,k( - x),
x> 0,
(5.1.32)
and = 0, otherwise,
We also note the explicit form of the densities gi,a,k of Gi,a,k' Since
Gi,a,k
G2 , !,k(log Gi,a)
on
we know that
gi,a,k(X) = g2, 1,k(lOg Gi,a(x))
~i":~~'
(r:x(Gi,a), w(Gi,a))
(5.1.33)
163
X-(/lk+1)
gl,/l,k(x)=ocex p(-x-/l)(k_1)!'
x>O,
( _X)/lk-i
g2,/l,k(X) = ocexp( -( _x)/l) (k _ 1)! '
-00
x < 0,
<x<
(5.1.34)
00.
Notice that
gi,/l,k = gi,/l( -log Gi,/l)k-i/(k - 1)!'
Lemma 1.6.6 yields that G2 ,l,k is the dJ. of the partial sum Sk = L~=i ~i
where ~ 1, ... , ~k are i.i.d. random variables with common dJ. F(x) = eX, x < 0.
Next it will be proved that n(Un-k+1:n - 1) is asymptotically distributed
according to G2 ,l,k (in other words, can asymptotically be represented by Sd.
As an extension of Lemma 5.1.2 we obtain
n -+
00.
(5.1.35)
1]
k-i (
i)) (
1 - ;;
x)n-k( _ X)k-i
(k _ 1)! '
+ ;;-
-n < x < 0,
and = 0, otherwise.
Obviously, Lemma 5.1.5 can be written
sup IP{nv,,-k+i:n
B
B} - G2 ,l,k(B)I-+ 0,
n -+
00,
(5.1.36)
where v,,-k+1:n is the kth largest order statistic of n i.i.d. random variables that
are uniformly distributed on ( - 1,0).
Recall that the uniform distribution on ( - 1,0) is the generalized Pareto
distribution W2 ,l' (5.1.36) can easily be extended to the other generalized
Pareto distributions Wi,/l by using the transformation technique.
Let again Ii,/l be defined as in (1.6.10). For x < 0, we have Ti,/l(x) =
( - Xfi//l, T2 ,/l(x) = - ( - X)i//l and T3,l (x) = -log( - x).
Since Ii,/l(nV,:n) = c;l(Xr:n - dn) where cn, dn are the constants of
Theorem 5.1.1 and since Gi,/l,k is induced by G2 ,l,k and Ii,/l [recall that
Ii:,} = Gi,\ 0 Gi,/l = log Gi,/l] the following result is immediate from Lemma
5.1.5.
Corollary 5.1.6. Let Xn-k+i:n be the kth largest order statistic of n i.i.d. random
variables with common generalized Pareto dJ. WE {Wi,/l' W2 ,/l' W3: oc > O}.
164
00,
(5.1.37)
(5.1.38)
(5.1.39)
In Section 5.4 it will be shown that Lemma 5.1.5 (and thus also Corollary
5.1.6) is valid with a remainder term of order O(k/n).
where C > 0 is a universal constant, and ar,n > 0 and br,n are normalizing
constants. In Section 5.4 it will be proved that
(5.1.41)
n-
00.
(5.1.42)
165
B}I 5 Nl/2H(P,G)
(5.2.1)
[f (fl/2(X) -
gl/2(XW dXJ /2
f and
[f (fl/2 -
(5.2.2)
gl/2)2 dl1 J /2
(5.2.3)
An Auxiliary Approximation
According to (5.1.4)
P(bn
n~
00,
(5.2.4)
n(l - F(bn + xa n )) ~ h(x),
n~
00.
166
F:
Lemma 5.2.1. There exists a universal constant C > 0 such that for every nand
all dI's F and G the following inequality holds:
+ C/n
holds if
+ H(Dn, G)
First, (1) will be verified in the special case of Fo(x) = 1 + x/n, - n < x < O.
Notice that Fo is the dJ. of n(Un : n - 1).
In this case we have Dn(x) == Do.n(x) = (eX - e- n)/(1 - e- n), -n < x < 0,
and, therefore, Do,n is the normalized restriction of the extreme value dJ. G2,1
to the interval ( - n, 0).
Denote by fo and do,n the densities of Fo and Do,n' Since
[f (nfoFo-1 /do,n -
1)2 dDo,n
J2
(see Lemma 3.3.9(ii)) it is immediate that (1) holds for Fo and Do,n if
f:n [e- x
(1 + ~)"-\1 -
e- n) -
1J
(2)
167
Cjn.
(5.2.5)
H(pn, G)
~ [2G(B +
C
[nfN - 1 - 10g(nfN)] dG
r (1 + log G) dG + r
JB
nGdFJ1 /2 + Cln
J{g=O)
(5.2.6)
H(pn, G)
~ [2G(B +
C)
[n(l - F) - log(nf)
+ 10g(GifJ)] dG T/2
+~.
(1)
(1 - F)dG =
=
L: (LOO
f(y)dy )dG(X)
f f l[xo,oo)(x) l(-OO,y)(x)f(y)g(x) dx dy
(fo
f(y) l[xo,oo)(Y)
g(x) dX) dy
(2)
r UN)dG + r
JB
G(y)dF(y).
J{g=O}
168
H(P, G) ::;
PROOF.
[f (nf/rll -
1 - log(nf/rll dG
Xo
-00.
J/
2
+ C/n.
(5.2.7)
flOg G dG = - 1.
(1)
f(l
+ 10gG)dG =
=
since x log x
-+
0 as x
-+
Il
Il
(1
+ 10g(G
(1
+ log x) dx = xlogxlA = 0
G-1)(xdx
O.
[fnf/~/~ If dG
J/+
2
C/n
(5.2.8)
where again'" = giG. This inequality shows once more (see also Section 5.1)
that the approximating dJ. G should be chosen in such a way that nf/rll is close
to one.
5.2.4. Let F(x) = <II(bn + b;;l x) where <II is the standard normal dJ.
and bn is the solution of bn = ncp(bn ) with cp = <11'. Then,
EXAMPLE
(5.2.9)
169
~ [f (mp(b
= [f (exp( -x 2/2b;) -
1 + X2/2b;)dG 3(x)J'2
+ C/n
1 ).
G2 .a;, G3 : ex >
l/11,a;(X) = exx-(1+a;),
x>0
x<O
l/13(X) = e- x ,
-00
<x<
(5.2.10)
00.
Theorem 5.2.5. Let G E {G1,a;, G2,a;, G3 : ex > OJ. Let F be a df. with density f
such that f(x) > 0 for Xo < x < w(F). Assume that w(F) = w(G). Then,
H(F n , G)
~ [L:(G) [nfN -
2G(BC ) +
= 0, and
= Gi,..(xo) + Gi,a;,2(XO)
= 2G(x o) - G(xo)log(G(xo)).
170
Limit Distributions
The results above provide us with useful auxiliary inequalities which, in a next
step, have to be applied to special examples or certain classes of underlying
d.f.'s to obtain a more explicit form of the error bound.
Our first example again reveals the exceptional role of the generalized
Pareto dJ.'s W;,a (at least, from a technical point of view).
EXAMPLE 5.2.6. (i) Let WE {WI,a, W2 ,a, W3:
of Theorem 5.1.1. Put
Fn(x)
IX
W(d n + xc n)
Cnw(d n + xc n) = ljJ(x)ln
(nfnN - 1 - 10g(nfnN))dG
O.
(In>oJ
Xo =
(ii) Let in (i) the generalized Pareto dJ. W be replaced by adJ. F which has
the same tail as W More precisely,
f(x)
w(x),
Xo.
Notice that the condition T(x o ) < x in Example 5.2.6(ii) makes the
accuracy of the approximation independent of the special underlying dJ. F.
Example 5.2.6 will be generalized to classes of d.f.'s which include the
generalized Pareto dJ.'s as well as the extreme value dJ.'s. Since our calculations are always carried out within an error bound of order O(n-l) it is
clear that the estimates will be inaccurate for extreme value d.f.'s.
Assume that the underlying density f is of the form
f= ljJe h
171
(5.2.11)
LX-IZ~
Ih(x)1
if i =
L( _X)IZ~
(5.2.12)
i=3
Le-~x
~ DDnn=: if
0 < fJ
fJ>1
(5.2.13)
f. 0(nfnN2, 1- 1 -log(nfnN2,d)dG2,1
= f.0 (eh(Xln) - 1 - h(x/n))dG2,l(x)
nxo
nxo
where fj only depends on X o, Land fJ. Now the assertion is immediate from
Theorem 5.2.5.
D
Extreme value dJ.'s have representations as given in (5.2.11) with fJ = 1 and
hex) = _x- IZ if i = 1, hex) = -( _X)IZ if i = 2, and hex) = _e- X if i = 3.
Moreover, the special case of h = 0 concerns the generalized Pareto densities.
Remark 5.2.S. Corollary 5.2.7 can as well be formulated for densities having
the representation
f(x) = I/I(x)(1
+ hex)),
(5.2.14)
172
equation
n<p(b - b- 1 )
= b.
G2,b2( -1
+ x/b 2 ).
exp( -x
+ x/b 2 -
(1 - x/b 2
x 2 /2b 2 )
1
173
Let again Hp denote the von Mises distribution with parameter {3. Then,
sup IHp(B) - f-Lp(B) I = 0(f3-2).
B
PROOF.
"
= O((logn)-2)
(5.2.17)
where Xn:n is the maximum of n i.i.d. standard normal r.v.'s, and bn is the
solution of the equation nqJ(b - b- 1 ) = b.
Figures 5.2.1-5.2.3 concern the density fn of $n(bn + an'), with bn =
$-1(1 - lin) and an = 1/(ncp(bn)) (compare with P.5.8), the Gumbel density
g3 and the derivative g3(1 + hn) of the expansion in (5.2.16).
Observe that fn and g3(1 + hn) have modes larger than zero; moreover,
g3(1 + hn) provides a better approximation to fn than g3'
0.5
-3
Figure 5.2.1. Normalized density 1. (dotted line) of maximum of normal r.v.'s, Gumbel
density 93, and expansion 93(1 + h.) for n = 40.
174
10
-0.025
Figure 5.2.2.
in -
g3' in
g3(1
0.025
10
-0.025
Figure 5.2.3.
in -
g3, in
g3(1
+ hnl for n =
400.
We are well aware that some statisticians take the slow convergence rate
of order O(1/log n) as an argument against the asymptotic theory of extremes,
perhaps, believing that a rate of order O(n-l/2) ensures a much better accuracy
of an approximation for small sample sizes. However, one may argue that
from the historical and mathematical point of view it is always challenging to
tackle this and related problems. Moreover, one should know that typical
statistical problems in extreme value theory do not concern normal r.v.'s.
The illustrations above and further numerical computations show that the
Gumbel approximation to the normalized dJ. and density of the maximum
of normal r.v.'s is of a reasonable accuracy for small sample sizes. This may
175
= 1/1(1
+ P + h)
()
Lx- aa
L( _x)aa
Le- ax
i=1
if i = 2
i=3
(5.2.19)
Gp.n(x) = G(x{ 1 - n- P
(5.2.20)
~ p .( _x)<l+ p)a]
e-(l+p)x
i = 1
if i = 2
i=3.
(5.2.21)
176
Notice that f = 1/1(1 + p + h) and Gp n arise from the special case with i = 2
and C/. = 1 via the transformation Ii.a = Gi~~ 0 G2 1 .
It is easy to check that Gp,n is a dJ. if n is sufficiently large; more precisely,
this holds if, and only if,
(5.2.22)
Theorem 5.2.11. Let G, 1/1 and T be as in Corollary 5.2.7. Assume that the
underlying density f has the representation
f(x) = l/I(x)(1
+ p(x) + hex)),
T(x o)
(5.2.23)
and =0,
Put
where
Cn,
H(F:, Gp,n)
PROOF.
It was observed by Radtke (1988) (compare with P.5.l6) that for a special
case the expansion Gp,n(x) can be replaced by G(bn + anx) where G is the
leading term of the expansion and bn --+ 0 and an --+ 1 as n --+ 00. Notice that
G(bn + anx) can be written-up to terms of higher order-as
G(x) [1
+ I/I(x)(bn + (an
- 1)x)]
where again 1/1 = G'IG. One can easily check that such a representation holds
in (5.2.21) if, and only if, i = 1 and p = IIC/..
... , ni
i= 1, ... ,N.
The r.v.'s n(i-l)+l' ... , ni may correspond to data which are collected
within the ith period (as e.g. the amount of daily rainfall within a year). Then,
the sample Mn, 1, ... , Mn,N of the annual maxima can be used to estimate the
unknown distribution of the maximum daily rainfall within a year. Condition
177
where ~ 1, ... , ~k are i.i.d. random variables with common "negative" exponential dJ. F(x) = eX for x < O. An extension of the result for a single order statistic
178
sup IP{ (n v'"n, n v,,-l:n, 00., n v,,-k+l:n) E B} - P{ (Sl' S2' 00., Sk)
sup IP{nVn-k+l:n
B} - P{Sk
B} I
B}I.
It is obvious that "~" holds. At first sight the equality looks surprising,
however, the miracle will have a simple explanation when the distributions
are represented in an appropriate way.
From Corollary 1.6.11 it is immediate that
(5.3.1)
B} - P{(Sl,S2,oo.,Sd E B}I
Sl
Sk-l
Sk
)
}
sup IP {(~S ,oo"-S'
/
EB
B
2
k - Sn+1 n
Sl
Sk-l ) EB } I =:A.
-P {(S2,oo.,----s;:,Sk
Notice that the first k - 1 components in the random vectors above are
equal. Moreover, it is straightforward to verify that the components in
each vector are independent since according to Corollary 1.6.11(iii) the r.v.'s
SdS2, 00', Sn/Sn+1, Sn+1 are independent. An application of inequality (3.3.4)
(which concerns an upper bound for the variational distance of product measures via the variational distances of the single components) yields
A~suplp{_
Sk / EB}-P{SkEB}1
B
Sn+l n
=
sup IP{nv,,-k+l:n
B
B} - P{Sk
B}I.
1 as n -->
00,
sup IP {(n v,,:n, n v,,-l :n' 00., n v,,-k+l:n) E B} - P{ (Sl' S2,' 00, Sk) E B} I --> O.
B
It is apparent that
179
= exp(xk),
(5.3.2)
and = 0, otherwise.
TI x
k
j=l
j-(a+l),
TI (_x)a-\
j=l
k-1
(5.3.5)
k)
Corollary 5.3.3. Let Xr:n be the rth order statistic of n i.i.d. random variables
with common generalized Pareto df. W;,a' Then,
sup IP{(c;1(Xn- j + 1,n - dn}}J=l
B} - G;,a,k(B)I-+ 0,
n -+
00,
where
Cn
PROOF. Straightforward from Lemma 5.3.2, the definition of G;,a,k and the fact
that
180
Domains of Attraction
This section concludes with a characterization of the domains of attractions
of joint distributions of a fixed number of upper extremes by means of the
corresponding result for sample maxima.
First, we refer to the well-known result (see e.g. Galambos (1987), Theorem
2.8.2) that a dJ. belongs to the weak domain of attraction of an extreme value
dJ. Gi a if, and only if, the corresponding result holds for the kth largest order
statistic with Gi,a,k as the limiting dJ.
Our interest is focused on the convergence W.r.t. the variational distance.
Theorem 5.3.4. Let F be a df. with density f Then, the following two statements
are equivalent:
(i) F belongs to the strong domain of attraction of an extreme value distribution
G E {Gl,a' GZ,a, G3 : 0: > a}.
(ii) There exist constants an > and bn such that for every positive integer k
there is a nondegenerate distribution G(k) such that
B} - G(k)(B)I--+ 0,
n --+
00.
In addition,
if (i) holds for G = Gi,a then (ii) is valid for G(k) = Gi,a,k'
PROOF.
n --+
(1)
00,
where G E {Gt,a, GZ,a, G3 : 0: > a}. According to Lemma 5.1.3, (i) is equivalent
to the condition that for every subsequence i(n) there exists a subsequence
m(n) := i(j(n)) such that
m(n)am(n)f(bm(n)
for Lebesgue almost all x
n --+
00,
n --+
00,
(2)
for Lebesgue almost all x = (xt, ... ,xk ) E (o:(G),w(G)t Furthermore, deduce
with the help of (1.4.4) that the density of (a;l (Xn- j+1 ,n - bn))j=t, say, fn,k is
given by
TI [(n j=l
00,
for Lebesgue almost all x with tX(G} < Xk < ... <
Lemma 3.3.2 implies (ii) with G(k) = Gi,~,k'
181
Xl
n -+
00.
(5.4.1)
(x
+ k}i dG 2 ,l,k(X)
182
+ 2, k) =
u(i
(i
+ 1) [ku(i, k) -
u(i
+ 1, k)].
(5.4.2)
Moreover,
fix
6k
(5.4.3)
g2,l.k(X)
- f (i
(x
= ex(_x)k-lj
+ 2, k) =
f (x
= - f (i
(i
+ 1) [ku(i, k) -
u(i
+ 1, k)].
Moreover, because of (i + 1) [(i + I)! + i!] = (i + 2)! we obtain by induction over i that IU(i, k)1 :5; i!ki/2j2. This implies (5.4.3) for every even i. Finally,
the Schwarz inequality yields
fix
em > 0 such
that for nand k E {I, ... , n} with kjn sufficiently small (so that the denominators
below are bounded away from zero) the following inequality holds:
2(m-l)
G2,l.k(B) + i~ P(i, n - k)
sup P {n v,,-k+l:n
B} -
J (x + k)i dG2,l,k(X)
B
---------=2..,..(m--~1)~-----=-=--------
1+
Moreover,
p(i,n)
j=O
p(i,n - k)u(i,k)
i=2
183
PROOF. Put
gn(x)
x + k)n-k( - X)k-l
1+n_ k
(k _ I)! 1(-n,o)(x),
-k (
=e
From Theorem 1.3.2 we conclude that gn/J gix)dx is the density ofnv,,_k+l:n.
Moreover, we write
fix)
[1 + 2:~1)
P(i, n - k)(x
+ kl 2m - l + (x + k)2m]g2, 1,k(x)
(1)
s~p /p{nv,,-k+l:n
:s; C
Moreover, because of (1
Schwarz inequality yields
B} - LJ..(X)dX/ fJ..(X)dX/
:s; C(k/nr
:s; 2G2,1,k(A~)
2(m-l)
(2)
+ kl i dG2,1,k(X)
:s; C(k/nr.
Combining this and (2) the proof is completed.
184
Theorem 5.4.3. For every positive integer m there exists a constant Cm > 0 such
that for every nand k E {I, ... , n} the following inequality holds:
sup Ip{nv,.-k+l,n
B
B} - [G2,l,k(B)
+ ~f
J=l
Pj,k,n dG 2,l,kJI
~ Cm(k/n)m
+ k)2
- k]/2(n - k)
and
(5.4.4)
+ k)4 -
u(4, k)]
+ k)2
+ /3(3, n -
k) [(x
+ k)3
- u(3, k)]
- u(2, k)].
dn) E B} - [ Gi,a,k(B)
+ ~f
J=l
~ Cm(k/n)m
(5.4.5)
where Cn and dn are the constants of Theorem 5.1.1 and Pj,k,n are the
polynomials of Theorem 5.4.3.
Next, we prove the corresponding result for joint distributions of upper
extremes.
Theorem 5.4.4. Let Xn,n, ... , X n- k+1,n be the k largest order statistics under
the generalized Pareto df WE {W1,a, W2,a, W3: a > O}. Let Cn, dn, Cm, and Pj,k,n
be as above. Then,
s~p Ip{(C;;-l(Xn,n -
dn))
B} - [Gi,a,k(B)
(5.4.6)
185
PROOF. It suffices to prove the assertion in the special case ofi = 2 and IX = 1.
The general case can easily be deduced by means of the transformation
technique. Thus, we have to prove that
n v,,-k+l:n) E B} - [ GZ ,I,k(B)
+"f JBr
J=1
Pi.k,n(Xk) dGZ,I,k(X)]
I~ Cm(k/n)m.
(5.4.7)
-+
(1 + ~~1
Pi,k,n(Xk)g2,I,k(X).
By inducing with x -+ (xdx 2, ... , xk-dxk, Xk) one obtains a product measure
where the kth component has the density
(
1+
t:
m-l
J=1
Pi.k,n g2,I,k'
Now inequality (A.3.3), which holds for signed measures, and Theorem 5.4.3
imply the assertion.
D
Next, Theorem 5.4.4 will be stated once more in the particular case of
m = 1. In an earlier version of this book we conjectured that adJ. F has the
tail of a generalized Pareto dJ. if an inequality of the form (5.4.1) (formulated
for d.f.'s) holds. This was confirmed in Falk (1989a).
Theorem 5.4.5. (i) If X n:n, ... , X n-k+l:n are the k largest order statistics
under a generalized Pareto df. WE {WI, .. , W2,.. , W3: IX > O} then there exists a
constant C > 0 such that for every k E {1, ... , n},
sup IP{ (C;;-I(X.. :.. - d.. ), ... , c;;-I(Xn-k+l:n - dn
B
Ck/n
B} - Gi, .. ,k(B) I
~~
WI, ..(x)
186
For a slightly stronger formulation of (ii) and for the proof we refer to Falk
(1989a).
Lemma 5.5.1. Given ~ E {GI,a,k, G2 ,a,k, G3 ,k: ex > O} let G denote the first
marginal df Let Xn:n :?: ... :?: Xn-k+l:n be the k largest order statistics of n
U.d. random variables with df F and density f Define again IjJ = g/G on the
support of G where g is the density of G. Moreover, fix Xo :?: -00.
Then,
sup IP{(Xn:n"",Xn-k+I:n)
B
2~(MC) +
S;; [
M [
B} - ~(B)I
jt
(5.5.1)
10g(nfNHx) ]
d~(x)
J/
2
+ Ck/n
where M = {x:
PROOF.
Xj
>
X O,
B} = P{ [F- I (1
+ (nv,,_j+1:n)/n]J=1
B}
= fln(B) + O(k/n)
(1)
uniformly over n, k, and Borel sets B where the measure fln is defined by
Xk
+ x)n)]J=1
E B}.
In analogy to the proof of Theorem 1.4.5, part III (see also Remark 1.5.3)
deduce that fln has the density hn defined by
hn(x)
187
n (nf(xj)),
j=l
k
and = 0, otherwise.
In (1), the measure J1.n can be replaced by the probability measure Qn = J1.n/bn
where
1 - exp( -n)
k-1
L.
ni/j!
j=O
= 1 + o (k/n).
Denote by gk the density of ~. Recall that gk(X) = G(Xk) n~=l r/J(xj ) for
oc(G) < X k < ... < Xl < w(G). Now, Lemma A.3.5, applied to Qn and ~,
implies the asserted inequality (5.5.1).
0
Next we formulate a simple version of Theorem 5.5.4 as an analogue to
Corollary 5.2.3. The proof can be left to the reader.
Corollary 5.5.2. Denote by Gj the jth marginal df. of ~ E {G1 ,IX,k' G2,IX,k, G3,k:
oc > O}, and write G = G1. If, in addition to the conditions of Lemma 5.5.1,
G{J> O} = 1 and w(F) = 0 for i = 2, then
sup IP{(Xn:n, ... ,Xn-k+1:n) E B} - ~(B)I
B
::;; [
::;;
j~
k
[ j~
f [nfN f [(nfN -
1 - 10g(nfN)] dGj
J1/2
+ Ck/n
k}
1/2 (log(k
+ lW
log n
The following theorem can be regarded as the main result of this section.
Notice that the integrals in the upper bound have only to be computed on
(xo,w(F)). Moreover, the condition G{J > O} = 1 as used in Corollary 5.5.2
is omitted.
Theorem 5.5.4. Denote by Gj the jth marginal df. of ~ E {G1,IX,k' G2,IX,k,
G3 ,k: oc > O}, and put G = G1. Let F be a df. with density f such that f(x) > 0
188
for Xo < x < w(F). Assume that w(F) = w(G). Define again
support of G where g is the density of G. Then,
sup IP{(Xn,n,,,,,Xn-k+l,n)
B
t/I =
giG on the
B} - GdB) I
(5.5.2)
PROOF. To prove (5.5.2) one has to establish an upper bound of the right-hand
side of (5.5.1).
Note that under the present conditions
Xl
Xk
= I, ... ,k}.
of~.
(1)
I':6
gk
f ~~ (1 -
[1 - F(x k )] d~(x) =
F)dGk ::;
W(G)
Xo
= j~
Moreover,
(log G(x k
d~(x)
f(y)Gk(y)dy
[(k-l
)]
(fN) j~o(-logGYlj!
dG
(2)
fW(G)
(fN)dGj .
Xo
W(G)
Xo
= - k
W(G)
Xo
= -
f~~
k(I -
Gk+l
(3)
(x o
Now the proof can easily be completed by combining (5.5.1) with (1)-(3).
Notice that Theorem 5.2.5 is a special case of Theorem 5.5.4.
189
l/I(x)eh(X),
(5.5.3)
B} -
if
i= 1
i=2
i= 3
(5.5.4)
where Cn, dn are the constants of Theorem 5.1.1 and D > 0 is a constant which
only depends on X o , Ci, and L.
We have dn = 0 if i = 1,2, and dn = log n if i= 3; moreover, Cn = n l/a if i = 1,
cn = n- l/a if i = 2, and Cn = 1 if i = 3.
Again it suffices to prove the result for the particular case G = G2 l .
Theorem 5.5.4 will be applied to xO.n = nxo and fn(x) = f(xln)/n. We obtain
PROOF.
[t
a.. (B) I
+ (1 + (k -
(1)
+ Ckln.
Check that Gk(x) = O((k/lxlt) uniformly in k and x < 0 for every positive
integer m. Moreover, since h is bounded on (xo, 0) we have
jt
(2)
k
S Dn- 2b j~
S Dn- 2
where r(t)
j;l
fO
-00
r(2Ci
+ j}/r(j)
190
Finally, observe that (compare with ErdeIyi et al. (1953), formula (5),
page 47)
k
j=l
Now by choosing m
r(2c5
+ j)/r(j) ~ D
L jUl.
(3)
j=l
Preliminaries
Let el' ... , en be i.i.d. random variables with common dJ. F which belongs to
the weak domain of attraction of G E {G1,1J' G2 ,1J' G3 : IX > O}. Hence according
to (5.1.4) there exist an> 0 and bn such that
n(l - F(bn + anx)) -+ -log G(x),
n -+
00,
(5.6.1)
for x E (IX (G), w(G)). According to the Poisson approximation to binomial r.v.'s
we know that
n
j=l
l(X,OO)(a;l(ej
bn ))
(5.6.2)
L e(~rbn)/aJB)
j=l
(5.6.3)
191
where 8z (B) = 1B(Z) and B = (x, (0). With B varying over all Borel sets we
obtain the empirical (point) process
n
Nn
= j=l
L 8(~rb")/a"
(5.6.4)
jeJ
8x
and /1(K) < 00 for every relatively compact set K. The set of all point measures
M is endowed with the smallest a-field .A such that the "projections" /1 -+ /1(B)
are measurable. It is apparent that N: n -+ M is measurable if N(B): n-+
[0, 00] is measurable for every Borel set B. If N is measurable then N is called
a point process. Hence, the empirical process is a point process. Certain
Poisson processes will be the limiting processes of empirical processes.
(5.6.5)
In the limit this point process will be the homogeneous Poisson process No
with unit rate. The Poisson process No is defined by
00
No =
L 8s
j=l
(5.6.6)
where Sj is the sum ofj i.i.d. standard "negative" exponential r.v.'s. Moreover,
M is the set of all point measures on the Borel sets in ( -00,0).
For every s > and n = 0,1,2, ... define the truncation N~S) by
(5.6.7)
Theorem 5.6.1. There exists a universal constant C > such that for every
positive integer nand s ~ log(n) the following inequality holds:
sup IP{N~S) E M} - P{N~S) E M}I ~ Cs/n.
Me.A
(5.6.8)
PROOF. Let v,,:n ~ ... ~ V1 : nbe the order statistics ofn i.i.d. random variables
with uniform distribution on ( -1, 0). Let k == k(n) be the smallest integer such
that
(1)
192
N(')
O,k
= "L...
;=1
SSj (.
n [-s , 0))
and
(2)
k
N(')
n,k
= "L...
;=1
S nVn-i+l:n (.
n [-s,0))
.
sup
MeA
IP{N~') E
M} -
p{M~i E
M}I :::;; n- 1
(3)
Note that N~~L n :2': 1, and Nd~)k may be written as the composition of the
random vectors (n v,,:n,' .. , n v,,-k+1:n), n :2': 1, and (Sl"'" Sk), respectively, and
the measurable map
k
(x 1 ,,xd -+
L sx,
;=1
MeA
(6)
MeA
(7)
Now (3), (5), (7), and the triangle inequality imply the asserted inequality.
The bound in Theorem 5.6.1 is sharp. Notice that for every k E {I, ... , n}
sup IP{Nn ( -t,O) < k - I} - P{No( -t,O) < k - 1}1
-s:s;; -t
(5.6.9)
-s:S;; - t
Hence a remainder term of a smaller order than that in (5.6.8) would yield a
result for order statistics which does not hold according to the expansion of
length 2 in Theorem 5.4.3.
193
Extensions
Denote by Vo the Lebesgue measure restricted to (-00,0). Recall that Vo is the
intensity measure of the homogeneous Poisson process No. We have
(5.6.10)
Write again 7i.a = G~~ 0 G2 1 (see (1.6.10)). Denote by Mi the set of point
measures on (a(G i.a), w(Gi.a)) and by .$( the pertaining a-field. Denote by 7i.a
also the map from Ml to Mi where 7i.all is the measure induced by 11 and 7i.a
Notice that if 11 =
eXj then
Lid
Tl,ar/I
= "i..J eT
jEJ
( ).
i,cr:Xj
Define
(5.6.11)
Ni.a.n = 7i.a(NJ
="i...J e(,
d
k=l
(5.6.12)
<,ok- d)'
n len
where ~ l ' ... , ~n are i.i.d. random variables with common generalized Pareto
dJ. Wi.a; moreover, Cn > 0 and dn are the usual normalizing constants as
defined in (1.3.13).
It is well known that N i a == N i a O is a Poisson process with intensity
measure vi.a = 7i.aVO (having the mean value function 10g(Gi.a)). Recall that
the distribution of N i a is uniquely characterized by the following two
properties:
(a) Ni.a(B) is a Poisson r.v. with parameter vi.a(B) if vi.a(B) < 00, and
(b) Ni.a(Bd, ... , Ni.a(Bm) are independent r.v.'s for mutually disjoint Borel
sets B 1 , . , Bm.
Define the truncated point processes Ni~s~.n by
(5.6.13)
From Theorem 5.6.1 and (5.6.11) it is obvious that the following result
holds.
Corollary 5.6.2. There exists a universal constant C > 0 such that for every
positive integer nand s ~ log(n) the following inequality holds:
sup IP{Ni~s~.n
ME .Jt;
M} - P{Ni~s~.O
(5.6.14)
194
Final Remarks
Theorem 5.6.1 and Corollary 5.6.2 can easily be extended to a large class of
dJ.'s F belonging to a neighborhood of a generalized Pareto dJ. W; . with
Ni~s~.O again being the approximating Poisson process. This can be proved just
by replacing Theorem 5.4.3 in the proof of Theorem 5.6.1 (for appropriate
inequalities we refer to Section 5.5). Moreover, in view of (5.6.9) and Theorem
5.4.5(ii) it is apparent that a bound of order O(s/n) can only be achieved if F
has the upper tail of a generalized Pareto dJ. The details will be omitted since
this topic will not be pursued further in this book.
In statistical applications one gets in the most simple case a model of
independent Poisson r.v.'s by choosing mutually disjoint sets. The value of s
has to be large to gain efficiency; on the other hand, the Poisson model
provides an accurate approximation only if s is sufficiently small compared
to n. The limiting model is represented by the unrestricted Poisson processes
N; .. One has to consider Poisson processes with intensity measures depending on location and scale parameters if the original model includes such
parameters. This family of Poisson processes can again be studied within a
3-parameter representation.
(X
> o}.
2. Check that the necessary and sufficient conditions (5.1.5)-(5.1.7) are trivially
satisfied by the generalized Pareto dJ.'s in the following sense:
(i) For x > 0 and t such that tx > 1:
(1 - Wl .,,(tx))/(1 - W1.,,(t)) =
(ii) For x < 0 and t
X-IX.
f'
(1 - W3 (y))dy/(1 - W3 (t)) = 1
and
(1 - W3 (t
+ x))/(1
- W3 (t)) =
e-
3. Let F l , F 2 , F 3, .,. be drs. Define G:'(x) = Fn(b: + a:x) and Gn(x) = F.(bn + anx)
where
> O. Assume that for some nondegenerate dJ. G*,
a:, a.
195
G: -> G*
weakly.
->
->
as n ->
00.
+ 4x + 1 =
+ 4x + 1,
X E
(c,O).
->
G2 1 (x)
as n ->
00.
and
on (l(G), w(G))
6. Let
that
n ->
00,
(l(G), w(G)).
(Chibisov, 1964)
'i
4: -,;). Prove
196
Let
a. =
l/mp(b.) and
Show that
+ (1
- 1') for x
~ Xo.
+ a.x) -
G3 (x)(1
and, thus,
sup 1F;(b. + x/b.) - G3 (x)1
O((log nrl).
1.5
2.0
2.5
197
-1
Q(
= 0.5, 1, 1.5,2,3.
+ PX)-l/P if 0 <
x.
+ PX)-l/P
0< x < - -
if
Vp(x) =
For
P=
x> --.
0, define
Vo(x) = 1 - e- X
Show that
W1,1/P(x) =
W2,1/IPI(X) =
X -
for x> O.
1)
ifP > 0,
ifP < 0,
Vp ( -P-
Vpel;1
W3 (x) = Vo(x).
+ PX)-(l +lIP).
If P< 0 then
vp(x) =
(1
+ px)-(1 +lIP)
-liP.
198
vp
with P= 0, 0.6, 2.
P=
vfJ
with
The Pareto densities vfJ with P~ 0 (Figure P.5.3) and the generalized Pareto
type II densities vp with Pi 0 (Figure P.5.4) approach the standard exponential
density Vo (dotted curve).
11. (Maxima with random indices)
Let ~i' i = 1,2, ... be i.i.d. random variables and let N(i), i = 0, 1, ... be positive
integer-valued r. v.'s.
(i) If G is an extreme value dJ., and
(a) P{a;;-l(X.,. - b.) ::;; x} -+ G(x), n -+ 00,
-+ 00,
in probability,
i -+ 00.
(Barndorff-Nielsen, 1964)
199
(ii) If the sequence (0 and N(j) are independent for every j then the condition
(i)(b) can be replaced by
(b') N(i)/i ~ N(O),
i -+ 00.
(iii) Show that the independence condition in (ii) cannot be omitted without
compensation. [Hint: Define N(i) = min{j: ~j > log i} for standard exponential r.v.'s.]
(M. Falk)
12. Show that the Cauchy dJ. with scale parameter, (J = n satisfies condition (5.2.14)
for i = 1 and a = 1 with b = 1. As a consequence one gets for the maximum Xn:n
of standard Cauchy r.v.'s that
sup IP{(n/n)Xn:n
B} - G1.1(B)1 = O(n- 1 ).
=1-
X> 0,
exp( -x'),
* 1,
B} - G3 (B)1 = O(I/logn).
= F' >
0 on the
= (1 - F)!f
Assume that the von Mises condition (5.1.25) holds for some i E {l, 2, 3} and a> 0
(with a = 1 if i = 3). Thus, we have
hi,.(x) := aH'(x) - 7;,.( -1) -+ 0
as
xi w(F)
Xn
= O(lhi,.(xn)1 + n- 1 )
an
= a/(nf(x n))
and
bn =
Xn -
7; . ( -l)an
(Radtke, 1988)
f" > 0
on the interval
= h:'.H/hi. + 7; . ( -1)/a
200
93(X
= 91 .(x)(K, + o(Xo)) as x
= 92 .(w(F)
+ tH(x)) =
93(X)(K,
- x)(K,
+ o(Xo))
->
= 1,
(i) Let
x;::: I,
for some
C(
p :-:::; 1. Then
C(
P[I-1 +~OgXl
x;::: I,
> O. Then
IFn(bn + anx) - Gl,.(x)(1 - hl,.(x n)t/J1 .(x) [x - 1]2/2)1
= O((logn)2/n2.
+ n- 1)
and
h 1 (x n ) = O(n-).
17. (i) Prove that for adJ. F and a positive integer k the following two statements
are equivalent:
(a) F belongs to the weak domain of attraction of an extreme value dJ.
G E {G 1 ,G2 ,G3 : C( > O}.
(b) There are constants an > 0 and bn such that the dJ.'s Fn.k defined by
Fn.k(x) = P{a;l(X.,n - bn):-:::; x 1, ... ,a;1(Xn_H1 ,. - bn):-:::;
converge weakly to a nondegenerate dJ. G(k).
(ii) In addition, if (a) holds for G = Gi then (b) is valid for
18.
G(k)
xd
Gi k
(i) There exists a constant C > 0 such that for every positive integer nand
k E {I, 2, ... , [n/2]} the following inequality holds:
n:
Bibliographical Notes
201
where
~ l ' ... , ~k
B}
f(x) = 1 - lxi,
x:;:;; 1,
one gets
sup IP{(n/2)1/2(X._ i + 1 :.
B
3/2 /n
Bibliographical Notes
An excellent survey ofthe literature concerning classical extreme value theory
can be found in the book of Galambos (1987). Therefore it suffices here to
repeat only some of the basic facts of the classical part and, in addition, to
give a more detailed account of the recent developments concerning approximations w.r.t. the variational distance etc. and higher order approximations.
Out of the long history, ofthe meanwhile classical part of the extreme value
theory, we have already mentioned the pioneering work of Fisher and Tippett
(1928), who provided a complete list of all possible limiting d.f.'s of sample
maxima. Gnedenko (1943) found necessary and sufficient conditions for adJ.
to belong to the weak domain of attraction of an extreme value dJ. De Haan
(1970) achieved a specification of the auxiliary function in Gnedenko's
characterization of F to belong to the domain of attraction of the Gumbel
dJ. G3
The conditions (1, oc) and (2, oc) in (5.1.24) which are sufficient for a dJ. to
belong to the weak domain of attraction of the extreme value dJ.'s Gl,Q! and
G2 .Q! are due to von Mises (1936). The corresponding condition (5.1.24)(3) for
202
the Gumbel dJ. G3 was found by de Haan (1970). Another set of "von Mises
conditions" is given in (5.1.25) for dJ.'s having two derivatives. For i = 3 this
condition is due to von Mises (1936). Its extension to the cases i = 1, 2
appeared in Pickands (1986).
In conjunction with strong domain of attraction, the von Mises conditions
have gained new interest. The pointwise convergence ofthe densities of sample
maxima under the von Mises condition (5.1.25), i = 3, was proved in Pickands
(1967) and independently in Reiss (1977a, 1981d). A thorough study of this
subject was carried out by de Haan and Resnick (1982), Falk (1985b), and
Sweeting (1985).
Sweeting, in his brilliant work, was able to show that the von Mises
conditions (5.1.24) are equivalent to the uniform convergence of densities
of normalized maxima on finite intervals. We also mention the article of
Pickands (1986) where a result closely related to that of Sweeting is proved
under certain differentability conditions imposed on F.
In (5.1.31) the number of exceedances of n i.i.d. random variables over
a threshold Un was studied to establish the limit law of the kth largest
order statistic. The key argument was that the number of exceedances is
asymptotically a Poisson r.v. This result also holds under weaker conditions.
We mention Leadbetter's conditions D(u n ) and D'(un ) for a stationary sequence
(for details see Leadbetter et al. (1983)).
A necessary and sufficient condition (see P.5.5(ii)) for the weak convergence
of normalized distributions of intermediate order statistics is due to Chibisov
(1964). The possible limiting dJ.'s were characterized by Chibisov (1964) and
Wu (1966) (see P.5.5(i)). Theorem 5.1.7, formulated for G3 ,k instead of N(O,l)'
is given in Reiss (1981d) under the stronger condition that the von Mises
condition (5.1.25), i = 3, holds; by the way, this result was proved via the
normal approximation. The weak convergence of intermediate order statistics
was extensively dealt with by Cooil (1985,1988). Cooil proved the asymptotic
joint normality of a fixed number of suitably normalized intermediate order
statistics under conditions that correspond to that in Theorem 5.1.7. For the
treatment of intermediate order statistics under dependence conditions we
refer to Watts et al. (1982).
Bounds for the remainder terms of limit laws concerning maxima were
established by various authors. We refer to W.J. Hall and J.A. Wellner (1979),
P. Hall (1979), R.A. Davis (1982), and the book of Galambos (1987) for bounds
with explicit constants.
As pointed out by Fisher and Tippett (1928), extreme value dJ.'s different
from the limiting ones (penultimate dJ.'s) may provide a more accurate
approximation to dJ.'s of sample maxima. This line of research was taken up
by Gomes (1978, 1984) and Cohen (1982a, b). Cohen (1982b), Smith (1982),
and Anderson (1984) found conditions that allow the computation of the rate
of convergence w.r.t. the Kolmogorov-Smirnov distance. Another notable
article pertaining to this is Zolotarev and Rachev (1985) who applied the
method of metric distances.
Bibliographical Notes
203
It can easily be deduced from a result of Matsunawa and Ikeda (1976) that
the variational distance between the normalized distribution of the k(n)th
largest order statistic of n independent, identically (0, 1)-uniformly distributed
r.v.'s and the gamma distribution with parameter k(n) tends to zero as n --+ 00
if k(n)!n tends to zero as n --+ 00. In Reiss (1981d) it was proved that the
accuracy of this approximation is :$; Ckln for some universal constant C. This
result was taken up by Falk (1986a) to prove an inequality related to (5.2.6)
W.r.t. the variational distance. A further improvement was achieved in Reiss
(1984): By proving the result in Reiss (1981d) w.r.t. the Hellinger distance
and by using an inequality for induced probability measures (compare with
Lemma 3.3.13) it was shown that Falk's result still holds if the variational
distance is replaced by the Hellinger distance. The present result is a further
improvement since the upper bound only depends on the upper tail of the
underlying distribution.
The investigation of extremes under densities of the form (5.2.14) was
initiated by L. Weiss (1971) who studied the particular case of a neighborhood
of Wei bull densities. The class of densities defined by (5.2.18) and (5.2.19)
corresponds to the class of dJ.'s introduced by Hall (1982a).
It is evident that ifthe underlying dJ. only slightly deviates from an extreme
value dJ. then the rate of convergence of the dJ. of the normalized maximum
to the limit dJ. can be of order o(n- 1 ). The rate is of exponential order if F
has the same upper tail as an extreme value dJ. It was shown by Rootzen
(1984) that this is the best order achievable under a dJ. unequal to an extreme
value dJ. It would be of interest to explore, in detail, the rates for the second
largest order statistic.
Because of historical reasons we note the explicit form of the interesting
expansion in Uzgoren (1954), which could have served as a guide to the
mathematical research of -expansions in extreme value theory:
2n
24n 2
+ _ _ + __ e- 2x +'"
_ _
8n 3
e- 3x +'" + '"
where bn = p-l(1 - lin) and g = (1 - F)lf The first two terms of the expansion
formally agree to that in (5.2.16) in the Gumbel case. However, as reported by
T.J. Sweeting (talk at the Oberwolfach meeting on "Extreme Value Theory,"
1987) the expansion is not valid as far as the third term is concerned.
Other references pertaining to this are Dronskers (1958), who established an
approximate density of the k(n)th largest order statistic and Haldane and
Jayakar (1963), who studied the particular case of extremes of normal r.v.'s.
Expansions oflength 2 related to that in (5.2.16) are well known in literature
(e.g. Anderson (1971) and Smith (1982)). These expansions were established in
204
k=l
G(k/n.
thus allowing a simultaneous treatment of the time scale and the sequence of
observations.
From the statistical point of view the weak convergence is not satisfactory.
The condition that F belongs to the domain of attraction of an extreme value
dJ. is not strong enough to yield e.g. the existence of a consistent estimator of
the tail index (that is, the index ry, of the domain of attraction). Thus, the weak
Bibliographical Notes
205
CHAPTER 6
207
+ 1 bounded
where f
F'.
CIJ
(6.1.1)
where
Gr(n).n --
m-1
,,-iI2S
L.... n
i.n
i=1
'V
ffi
+ <P
1
( ) _ [2q - 1 aj'(F- (q))J 2
S1.n t 3a + 2f(F-1(q))2 t
[-q
+ nq -
(6.1.2)
r(n)
+1
2(2q - 1)J
3a
.
(6.1.3)
If
h(x)f..(n),n(x)dx -
~r
JBn
h(X)gr(n),n(X)dXI
(1)
= 0 (n- mI2
In
B~
Ixlk<p(x)(1
+ Ixl3m)dx + I~
Ixlk(f..(n),n(X) + Igr(n),n(x)J)dx ).
It remains to prove an upper bound for the second term on the right-hand
side of (1). Straightforward calculations yield
JB~
Ixlklgr(n),n(x)1 dx
O(n- mI2 ),
208
Ixlkf,.(.).n(x)dx
O(n-m/2).
B~
Apparently,
an=f
IXr(n):n- F - 1(qWdP
_I
(x,(")," <F
(q)-I"}
F- 1 (q)l k dP =:
IXr(n):n -
O:n,l
+ a.,2
tn}
o (P{Ur(n):. ~ F(F-1(q) +
(q) -
(q)-I"}
t.)}
~ F(F-
IXr(n):nl kdP )
-I
(x,(")," <F
+ ks + 1)
P Ur(.)-ks:n-U+1)k
t n )})
where b denotes the beta function. Applying Lemma 3.1.1 one obtains that
= O(n-m/2). We may also prove O:n,2 = O(n-m/2) which completes the
0:.,1
~~
(q(1 - q))k/2
nk/2f(F l(q))k
f,
IkdcI>( ) + O( -(k+1)/2)
x
n
(6.1.4)
+ 1 derivatives on a neighborhood of F- 1(q), and that f(F-1(q)) > 0 where f = F'. Suppose
that (r(n)/n - q) = O(n-1).
Then there exist polynomials Ri,n, i = 1, ... , m - 1, such that uniformly over
Ixl ~ logn,
Theorem 6.1.2. Let q E (0, 1) be fixed. Suppose that the df. F has m
209
(6.1.6)
and
1 11
~;/~~~ q~~~;)}
(6.1.7)
= ex
+ O(n-m/2).
210
normality of a linear combination of order statistics related to the Lindebergcondition for sums of independent r.v.'s or martingales.
Such a condition (see (6.2.4)) was found by Hecker (1976) in the special case
of order statistics of i.i.d. uniform r.v.'s. This theorem is a simple application
of the central limit theorem.
Theorem 6.2.1. Given a triangular array of constants ai,n' i
i)
n (
j-l
bJ,n~.
=
1-a',n - .L - a
+
1
'=J
n
,=1 n + 1 ',n'
= 1, ... , n, define
j = 1, ... , n + 1,
(6.2.1)
and
(6.2.2)
Then,
_i-)
n+1
:5;
t} --+ <I>(t),
n --+
00,
(6.2.3)
n --+
(6.2.4)
00.
PROOF. Let '11' '12' '13' ... be i.i.d. standard exponential r.v.'s. Put Si = L~=1 '1j.
From Corollary 1.6.9 it is immediate that
(1)
Check that
n
L ai,n[Si - iSn+d(n
i=1
From (2) and the fact that E'1j = 1 it is
n+l
E L bj,n'1j =
j=1
+ 1)] =
n+l
L bj,n'1j.
j=1
clear that
n+l
L bj,n = O.
j=1
(2)
(3)
Consequently, 1:; is the variance ofLj~i bj,n'1j. Moreover, since Sn+l/(n + 1)--+
1 in probability as n --+ 00 we deduce from (1)-(3) that (6.2.3) holds if, and only
if,
n+l
J=1
n --+
00.
(4)
The equivalence of (4) and (6.2.4) is a particular case of the Lindeberg-LevyFeller theorem as proved in Chernoff et al. (1967), Lemma 1.
0
211
6.2.2. If ar.n = 1 and ai,n = 0 for i #- r (that is, we consider the order
statistic Ur:n) then T~ = r(n - r + 1)/(n + 1)3 in (6.2.2). Furthermore, (6.2.4)
is equivalent to r(n) --+ if.) or n - r(n) --+ if.) as n --+ 00 with r(n) in place of r.
EXAMPLE
(6.2.5)
--+ <l>(t),
for every t if, and only if, (6.2.4) holds.
Of course, this result is very artificial. It would be interesting to know
whether the index set In can be replaced by {r(n) + i: i = 1, ... , k(n)} etc. It is
left to the reader to formulate other theorems of this type by using Theorem
4.3.1 or Theorem 4.5.3.
p { i~ Xi:n ::;; t =
f{
f r~~l
s-r-1
}
P x + i~r Y;:s-r-1 + y ::;; t dQr,s,n(x,y)
= P
Y; ::;; t - (x
+ y)} dQr,s,n(x, y)
where Y1' ... , Y.-r-1 are i.i.d. random variables with common dJ. Fx,y'
Now we are able to apply the classical results for sums of i.i.d. random
variables to the integrand. Moreover, Section 4.5 provides a normal approximation to Qr,s,n' Concerning more details we refer again to Helmers (1982).
Systematic Statistics
The notion of systematic statistics goes back to Mosteller (1946); we mention
this expression for historical reasons only because nowadays one would speak
212
O(n-l/2)
where
PROOF.
= J1. + UF-l(q).
Under the conditions of Lemma 6.2.3 we obtain for the sample quantiles
X[nq;l:n of n i.i.d. random variables with common dJ. Fit ... that with U j = uj(F)
as in Lemma 6.2.3:
s~p Ip{u;~~~)(X[nQo]:n -
(6.2.6)
(6.2.7)
213
and
+ yXr+1:n,
Y E [0,1],
which may be used as estimators of the q-quantile. The most important case
is the sample median for even sample sizes.
It is apparent that this statistic has the same asymptotic behavior for every
y E [0,1] as far as the first order performance is concerned. The different
performance of the statistics for varying y can be detected if the second order
term is studied. For this purpose we shall establish an expansion oflength 2.
Denote by Fr n the dJ. of e- 1 (Xr:n - d). From Corollary 1.8.5 it is
immediate that for y and t,
P{(1 - y)Xr:n
= Fr.n(t) -
foo P{y[Yl:n-r -
(d
(6.2.8)
d)
B} -
dGn.rl = O(n- 1 ).
(6.2.9)
s~p Ip{(1 -
y)Xr:n
(6.2.11)
r)Fd+cx[d
+ ex + e(t - X)fy]]dGr,iX)JI
= O(n- 1 ).
214
Notice that if y = then, in view of (6.2.1 0), the integral in (6.2.11) can be
replaced by zero.
Specifying normalizing constants and an expansion Gr n of Fr n we obtain
the following theorem.
Theorem 6.2.4. Let q E (0, 1) be fixed. Assume that F has three bounded
derivatives on a neighborhood of F- 1(q) and that f(F- 1(q)) > where f = P.
IP {
0(n-1/Z). Put
n1/2f(F-l(q))
(J
[(1 - y)Xr(n),n
Rn(t)
= -
+ yXr(n)+b -
q(1 - q)).
-1
(q)] ~ t
+ n- 1/z <p(t)Rn(t)) I =
- (<I>(t)
where
(Jz
0(n- 1/2)
1 - 2q (Jf'(F- 1(q))] Z
[ ~ - 2f(F-1(q))2 t
_ [q - nq
+ r(n) + y (J
1 + 2(1 - 2q)].
3(J
r)Fd+cx[d
+ ex + e(t -
x)/yJJ <p(x) dx
(1)
and
r)Fd+cx[d
+ ex + e(t -
(2)
uniformly in y and t. The proof of (1) will be carried out in detail. Similar
arguments lead to (2).
Since <1>( - (log n)/2) = O(n-1) it is obvious that
can be replaced by J~logn
where t ~ (log n)/2. Then, the integrand is of order O(n-1) for those x with
e(t - x)/y > s(log n)/n for some sufficiently large s > 0. Thus, J~oo can be
replaced by J~(n) where u(n) = max( -log n, t - ys(log n)/en).
Under the condition that F has three bounded derivatives it is not difficult
to check that for u(n) ~ x ~ t,
f-oo
Fd+cx[d
=
+ ex + e(t f(d)e(t - x)
(1 _ q)y
x)/yJ
+ O[elxl(c(t -
x)/y
+ (e(t
- x)/y)Z)]
(3)
215
Thus, (1) has to be verified with the left-hand side replaced by the term
i:n) ex p [ -(n -
which, by substituting y
n-l/2(y/cr)
r)n-1/2~(t_-q;ncp(X)dX
= n1/2cr(t -
I [1 v(n)
exp
(4)
r/n ]
y cp(t - n- 1/2yy/cr) dy
l-q
(5)
= exp( -
and
cp(t - n- 1/2yy/cr)
y)[1
+ o(nO)]
= cp(t) [1 + o(nO)]
(6)
(v(")
exp( - y) dy(1
+ o(nO)).
(7)
Notice that the relations above hold uniformly in p and t. Now (1) is
immediate.
0
Notice that for y = 0 we again get the expansion of length two of the
normalized dJ. of Xr(n):n as given in P.4.5. Moreover, for y = 0 and for r(n)
replaced by r(n) + 1 we get the same expansion as for p = 1 and r(n).
If q = !, f'(F-l(I/2)) = 0, n = 2m, and r = m then
P{[(2m)1/2!(F-l(I/2))/2] [(Xm:2m
= cI>(t) +
+ X m+1:2m)/2 -
F-l(I/2)] :::;; t}
(6.2.12)
o(n- 1/2 ).
Thus, the sample median for even sample sizes is asymptotically normal with
a remainder term of order o(n- 1/2 ). For odd sample sizes the corresponding
result was proved in Section 4.2.
Remark 6.2.5. Let qo E (0, 1). Assume that F has three bounded derivatives on
a neighborhood of F-1(qo) and that !(F- 1(qo)) > 0. Then a short examination
of the proof to Theorem 6.3.4 reveals that the assertion holds uniformly over
all q in a sufficiently small neighborhood of qo and r(n) == r(q, n) such that
SUPq Ir(q, n)/n - ql = o(n- 1/ 2 ). This yields the version of Theorem 6.3.4 as cited
in Pfanzagl (1985).
216
T" = n- 1
are estimators of the functional
J.l(F) =
i=l
Ii
f
(_i+-)
n
Xi:n
J(s)F-l(s)ds.
(6.2.13)
(6.2.14)
xJ(F(x))dF(x)
(6.2.15)
00
and
f f J(F(x))J(F(y))(min(F(x), F(y)) -
F(x)F(y)) dx dy > 0.
Motivation
To get some insight into the nature of this approximation let us consider the
special case of i.i.d. (0, I)-uniformly distributed r.v.'s 111' 112' ... , 11n. Denote by
Gn and Vi:n the pertaining sample dJ. and the ith order statistics. We already
217
Gn(rln) = n- 1
i=l
1(-00.r/n)(I1;)
and of Ur:n are concentrated about rln. Moreover, relation (1.1.6) shows that
pointwise
Ur:n -
~n $;
iff Gn
(~)
~n ~ 0.
n
(6.3.1)
+ (Gn(rln) -
rln)
(0, 1),
(6.3.2)
(0, 1),
(6.3.3)
where Fn and Fn- 1 are the sample dJ. and sample qJ. based on the r.v.'s ~i.
The connection between (6.3.2) and (6.3.3) becomes obvious by noting that
the transformation technique yields
+ (Fn(F-1(q)) - q)
J(F-1(q))(F- 1(G;;l (q)) - F-1(q)) + (Gn(F(F-1(q))) -
q).
(6.3.4)
and hence results for the Bahadur statistic in the uniform case can easily be
extended to continuous dJ.'s F.
218
Theorem 6.3.1. For every s > 0 there exists a constant C(s) such that
P{ I(G;l(q) - q)
+ (Gn(q) -
for some
+ 3) max {(q(1
- q))1/4, (7(s
--t
+ (Gn(q) -
q)1 >
(10gn)1/2t}
3/4
2 L (_l)m+1 e-2m2t4
m
Qn(A)
= n- 1 L
i=l
lA('1i)
where the '1i are i.i.d. random variables with common uniform distribution Qo
on (0, 1). Recall that the Glivenko-Cantelli theorem yields
n --t
00,
w.p. 1,
(6.3.5)
Lemma 6.3.2. For every s > 0 there exists a constant A(s) such that for every n:
P{
PROOF.
~~~ max{O'(1),((10gn)jn)1/2} ~ s +
3)(1
ogn
)1/2}
<
A()-S
sn .
n1/2IQn(I) - Qo(I)1
}
{ () j 1/2} ~ e
max O'1,pn
(6.3.6)
219
:s;
I do
L
IE
.Jb
(1)
where 6(I) = 6 max {(j2(I) - 2/n - 4/n2, p2/n} 1/2 - 2n- 1/2 . Let J E.fo be fixed.
Assume w.l.g. that (j(I) > and 6p ~ 7/2 so that 6(1) > 0. Using the exponentialbound(3.1.1)witht = (j(I)/max{((j2(I) - 2/n - 4/n2)/p2, 1/np/2 weobtain
3 p 2 +"2
7
:s; 2 exp [ - 6p + 4"
+;;3J .
Remark 6.3.3. Lemma 6.3.2 holds for any i.i.d. random variables (with
arbitrary common distribution Q in place of Qo). The general case be reduced
to the special case of Lemma 6.3.2 by means of the quantile transformation.
Lemma 6.3.2 together with the Borel-Cantelli lemma yields
.
n 1/2 1Qn(J) - Qo(I)1
hm sup sup
( )(l
)1/2
::; 5
n
IE.}n
(j J ogn
(6.3.7)
w.p. 1
(6.3.8)
= I w.p.1
where.f.* = {I E .~: J = (a,b], r:xan:s; Qo(I):s; {Jan} with < r:x < {J < 00, and
an has the properties an! 0, nan i 00, log a;; 1 = o(na n) and (log a;;1 )/(loglog n) ~
00 as n ~ 00. Note that (6.3.8) shows that the rate in (6.3.7) is sharp.
Theorem 6.3.1 will be an immediate consequence of Lemma 6.3.2 and
Lemma 3.1.5 which concerns the maximum deviation of the sample qJ. Gn- 1
from the (0, I)-uniform qJ.
PROOF OF THEOREM 6.3.1. Since IGn (Gn- 1 (q - ql :s; l/n we obtain
IGn- 1(q) - q
+ (Gn(q) -
Ix-ql";"
+ (Gn(q) -
Ix - Gn(x) + Gn(q) - ql
q)1
+ l/n
220
whenever IG;l(q) - ql ::S; K and I(q) runs over all intervals (x, q] and (q, x] with
ql ::S; K. Thus, by Lemma 6.3.2 and Lemma 3.1.5 applied to K = K(q,s,n),
we get
Ix -
P{IG;l(q) - q
+ (Gn(q) -
~ P {sup IQn(I(q)) -
q E (0, I)}
Qo(I(q)) I ::S;
J(q)
(s
~
1 - [A(s)
+ 3)((logn)/n)1/2 K(q,s,n)1/2,
+ B(s)]n-
q E (0,
1)} - B(s)n-
where A(s) and B(s) are the constants of Lemma 6.3.2 and Lemma 3.1.5. The
proof is complete.
D
Introduction
Since the sample dJ. Fn is a natural nonparametric estimator of the unknown
underlying dJ. F it is plausible that the statistical functional T(Fn) is an
appropriate estimator of T(F) for a large class of functionals T.
In connection with covering probabilities and confidence intervals one is
interested in the dJ.
T,.(F, t) = PF{T(Fn) - T(F) ::S; t}
of the centered statistic T(Fn) - T(F).
The basic idea of the bootstrap approach is to estimate the dJ. T,.(F, . ) by
means of the bootstrap dJ. T,.(Fn'). Thus, the underlying dJ. F is simply
replaced by the sample dJ. FnLet us touch on the following aspects:
(a) the calculation of the bootstrap dJ. by enumeration or alternatively, by
Monte Carlo resampling,
(b) the validity of the bootstrap approach,
(c) the construction of confidence intervals for T(F) via the bootstrap
approach.
221
L l(-oo.tj(x;),
i=l
:$;
t} = 1 - [1 - F(rx(F)
+ t)]n
and
If F is continuous then w.p. 1,
T,,(Fn' 0) - T,,(F,O)
= 1-
(1 -
~y --+ 1 -
exp( -1),
n --+
00.
222
(6.4.2)
F(F-1(q)
+ t))n-i
and the same representation holds for T,,(Fn' t) with F- 1 replaced by Fn- 1.
From Theorem 4.1.4 we know that T,,(F, t), suitably normalized, approaches
the standard normal dJ. <l> as n --> 00. The normalized version of T,,(F, t) is
given by
(6.4.3)
if F = <l>.
To prove that the bootstrap dJ. T,,(Fn' .) is a consistent estimator of T,,(<l>, .)
one has to show that, T,,*(Fn, t) --> <l>(t), n --> 00, for every t, w.p. 1.
-3
-2
-1
Figure 6.4.1. Normalized dJ. 7;,*(<1>, .) of sample q-quantile and bootstrap dJ. 7;,*(F", .)
for q = 0.4 and n = 20, 200.
223
The numerical calculations above were carried out by using the normal
approximation to the dJ. of the sample quantile of i.i.d. (0, I)-uniformly
distributed r.v.'s. Otherwise, the computation of the binomial coefficients
would cause numerical difficulties. Computations for the sample size n = 20
showed that the error of this approximation is negligible.
From Figure 6.4.1 we see that Tz"O(<I>, .) and Tz"Oo(<I>, .) are close together
(and, by the way, close to <D) indicating a quick convergence. The bootstrap
dJ. T,,*(Fn' .) is a step function which slowly approaches <1>. Next, we indicate
this rate of convergence.
Asymptotic Investigations
The further analysis will be simplified by using the normal approximation to
the dJ. of the sample q-quantile of n i.i.d. (0, I)-uniformly distributed r.v.'s.
From Corollary 1.2.7 and (4.2.1), applied to m = 1, we deduce
T,,(Fn' t/n 1/Z ) - T,,(F, t/n1/Z)
= <I>
+ O(n
-1/2
(6.4.4)
)
+ 0(1)
uniformly over t [where the second relation holds if F has a derivative, say,
f(F- 1(q)) at F-1(q)]. Moreover, the function gn,t is defined by
gn,t
Fn(Fn- 1(q)
+ t/n 1/2 ) -
Fn(Fn- 1(q))
# 0,
(6.4.5)
and = if t = 0.
The auxiliary function gn,t is a "naive" estimator of the density at the
random point Fn- 1 (q). Thus, the stochastic behavior of the bootstrap error
T,,(Fn, t/n 1/2 ) - T,,(F, t/n 1/2 ) is closely related to that of a density estimator. We
have
sup IT,,(Fn' t) - T,,(F, t) I --+ 0,
n --+
00,
w.p. 1,
(6.4.6)
that is, the bootstrap estimator is strongly consistent, if w.p. 1 for every t # 0,
n --+
00.
(6.4.7)
Let us assume that F has a derivative, say, f near F- 1 (q) and f is continuous
at F- 1 (q). From Lemma 3.1.7(ii) and the Borel-Cantelli lemma it follows that,
w.p. 1, for every t # 0,
224
n 1/4
1
)1/2 sup I T,,(Fn' t) - T,,(F, t)1
og ogn
Kq,F
>0
w.p.l
(6.4.8)
and cp = <1>'.
Theorem 6.4.1. Assume that F is a continuous df having a derivative f near
F- 1(q) which satisfies a local Lipschitz-condition of order (j> 1/2 and that
f(F- 1(q)) > O. Then, Zn weakly converges to a process Z defined by
Z(t ) = {
B1 ( - t)
B 2 (t)
'f t:<:;; 0
t> 0
We refer to Falk and Reiss (1989) for a detailed proof of Theorem 6.4.1
and for a definition of the weak convergence on the set of all right continuous
functions on the real line having left-hand limits.
The basic idea of the proofis to examine the expressions in (6.4.3) and (6.4.5)
conditioned on the sample q-quantile Fn- 1(q). Notice that the r.v.'s gn,t only
225
depend on order statistics smaller (larger) than Fn- 1 (q) if t ~ 0 (if t > 0). Thus,
it follows from Theorem 1.8.1 that, conditioned on Fn-1(q), the processes
(gn,t)t~O
and (gn,t)t> 0
are conditionally independent. Theorem 6.4.1 reveals that we get the unconditioned independence in the limit.
(6.4.9)
Itl~3
1.0
0.5
0.0
+---~~------~-
0.5
1.0
Figure 6.4.2. Normalized dJ. Hn of maximum bootstrap error for q = 0.5 and n = 200,
2000 with H 200 ~ H2000
Confidence Bounds
Next, we consider the problem of setting t'Y0-sided confidence bounds for the
unknown parameter T(F). First, let us look at the problem from the point of
view of a practitioner. One has to find a random variable cn(oc) such that
(6.4.10)
The bootstrap solution is to take cioc) such that the bootstrap dJ. satisfies
226
1 - a.
The validity of (6.4.10) can be made plausible by the argument that uniformly
over all t
This idea will be made rigorous in the particular case of the q-quantile via
asymptotic considerations.
If F has a derivative f near F-l(q) and f is continuous at F- 1 (q) then we
know that
n -+
00,
-I
i (
(q.
cp(x) 1 +
.2:
3(m-l)
(_1)i
.=1
-T Hi(x)
C
dx = O(n-m/2)
I.
(Reiss, 1974b)
2. Under the conditions of Theorem 6.1.2, with q = 1/2, m = 3 and odd sample sizes n,
P { X[n/21+bn > F
-I
(1/2)
+ 2fon l / 2 A. -
1 fl 2
4n l /2 f02 A.
-~(A
+
A3(1- 2ff/0
4n
+ ~)))}
=
6f 3
0
IX
+ O(n-
3/
2)
Bibliographical Notes
227
where bj
= (n
- j
~ I ' ... , ~ .
Show that
{r;;-I .
ai (X i ,. -
EX i ,.)
::;
l=l
t}
-+ lI>(t),
n -+
00,
max r;;-llbj,.I-+ 0,
j=1
n -+
00.
Bibliographical Notes
An approach related to that in Theorem 6.1.1 was adopted by Hodges and
Lehmann (1967) for expanding the variance of the sample median (without
rigorous proof). These investigations led to the famous paper by Hodges and
Lehmann (1970) concerning the second order efficiency (deficiency).
Concerning limit theorems for moments of extremes we refer to Pickands
(1968), Polfeldt (1970), Ramachandran (1984), and Resnick (1987).
Concerning linear combinations of order statistics we already mentioned
the book of Helmers (1982). A survey of other approaches for deriving limit
theorems for linear combinations of order statistics is given in the book of
Serfling (1980). A more recent result concerning linear combinations of order
statistics is due to van Zwet (1984): A representation as a symmetric statistics leads to a Berry-Esseen type theorem that is essentially equivalent to
Theorem 6.2.6.
Limit laws for sums of extremes and intermediate order statistics have
attained considerable attention in the last years. This problem is related to
that of weak convergence of sums of i.i.d. random variables to a stable law
(see Feller (1972)). Concerning weak laws we refer to the articles of M. Csorgo
et al. (1986), S. Csorgo and D.M. Mason (1986), and S. Csorgo et al. (1986).
A. Janssen (1988) proved a corresponding limit law w.r.t. the variational
distance. An earlier notable article pertaining to this is that of Teugels (1981),
among others.
Spacings and functions of spacings (understood in the greater generality of
m-step spacings) are dealt with in several parts of the book as e.g. in the context
of estimating the quantile density function. We did not make any attempt to
228
cover this field to its full extent. For a comprehensive treatment of spacings
see Pyke (1965, 1972). Several test statistics in nonparametric statistics are
based on spacings. In the present context, the most interesting ones are
perhaps those based on m-step spacings. For a survey of recent results we refer
to the article of lammalamadaka S. Rao and M. Kuo (1984). Interesting results
concerning "systematic" statistic (including x2-test) are given by Miyamoto
(1976).
A first improvement of Bahadur's original result in 1966 was achieved by
Kiefer (1967), namely a law ofthe iterated logarithm analogue for the Bahadur
approximation evaluated at a single point. Limit theorems like that stated in
Section 6.3 are contained in the article of Kiefer (1969a). Further extensions
concern (a) the weakening of conditions imposed on the underlying r.v.'s (see
e.g. Sen, 1972) and (b) non-uniform bounds for the remainder term of the
Bahadur approximation (e.g. Singh, 1979).
It was observed by Bickel and Freedman (1981) that bootstrapping leads
to inconsistent estimators in case of extremes. An interesting recent survey of
various techniques related to bootstrap was given by Beran (1985). We refer
to Klenk and Stute (1987) for an application of the bootstrap method to linear
combinations of order statistics.
CHAPTER 7
(7.1.1)
I(j),
(7.1.2)
230
have this property. We do not know whether this idea can be made rigorous,
though.
The asymptotic normality of order statistics can be proved via the device
of Section 2.1, namely, to represent the dJ. of order statistics as the dJ. of a
sum of i.i.d. random vectors. To simplify the writing let us study the 2dimensional case. According to Section 2.1 we have
Lt (1(-oo,tl,nl(~i,
d, 1(-oo,t2,nl(~i,2 ~ r(n) }
(7.1.3)
n --+
00,
i = 1,2.
(7.1.4)
ti,n = Fi-l(qi)
+ x;/n I/21;,
i = 1,2,
tn
(7.1.5)
where F'; is the ith marginal dJ. of F and I; = Fi(F';-I(qi' Let us rewrite the
right-hand side of (7.1.3) by
(7.1.6)
and
(7.1.8)
Obviously, 11i,n, i = 1, 2, ... , n, are bounded i.i.d. random vectors with mean
vector zero and covariance matrix ~n = (O'i,j,n) given by
O'i,i,n = Fi(t i,n)(1 - Fi(ti,n,
i = 1,2
(7.1.9)
and
Theorem 7.1.1. Assume that F is continuous at the point (F11 (ql),F;I(q2))' More-
= Fi(Fi-l(qi)) > O.
i = 1,2,
and
(7.1.10)
231
If det(:E) "# 0 and condition (7.1.4) holds then for every (Xl,X2):
n -+ 00, (7.1.11)
where <I>}; is the bivariate normal df. with mean vector zero and covariance
matrix :E.
PROOF. Let :En and 1)i.n be as in (7.1.9) and (7.1.7). Since :En -+:E, n -+ 00, we
may assume w.l.g. that det(:E n) # O. Let T" be a matrix such that T,,2 = :E;1
[compare with Bhattacharya and Rao (1976), (16.3), and (16.4)]. Then, according to a Berry-Esseen type theorem (see Bhattacharya and Rao (1976), Corollary 18.3) we get
<I>};Jz)
I~ cn-l/2EIIT,,1)l,nll~ = O(n-
1/2 )
(7.1.12)
for some constant c > O. Here II 112 denotes the Euclidean norm.
The differentiability of Fi at Fi- l (qi) and condition (7.1.4) yield that xi,n -+ Xi>
n -+ 00, and hence
n -+
00.
(7.1.13)
The error rates in (7.1.11) can easily be computed under slightly stronger
regularity conditions imposed on F.
The condition det(:E) # 0 is rather a mild one. If ~i = (C;;. c;i) are random
vectors having the same r.v. in both components then det(:E) = 0 if ql = q2
and det(:E) # 0 if ql # q2' It is clear that the two procedures of taking two
. . X ':n' X .:n accord'mg to '>1,
}:. ... , '}:.>n or ord er statIstIcs
. . X(l)
X(2)
ord er statIstIcs
ron'
s:n
according to ~l' . , ~n are identical. Thus, the situation of Section 4.5 can be
regarded as a special case of the multivariate one.
Next we give a straightforward generalization of Theorem 7.1.1 to the case
d ~ 3. We take one order statistic X~~,n):11 out of each of the d components.
n -+ 00, i = 1, ... , d.
(7.1.14)
Define:E = (O'i) by
i = 1, ... , d,
and
(7.1.15)
232
= (x!,oo.,x d ),
~ Xi' i = 1, ... , d} -+ <l>r(x),
n -+
00,
(7.1.16)
where <l>r is the d-variate normal df with mean vector zero and covariance
matrix ~.
Weak Convergence
The weak convergence is again the pointwise convergence of d.f.'s if the
limiting d.f. is continuous which will always be assumed in this section. The
weak convergence of d-variate d.f.'s implies the weak convergence of the
univariate marginal d.f.'s (since the projections are continuous). In particular,
if Fnn weakly converges to Go then the univariate marginal d.f.'s F:,1 also
converge weakly to the univariate marginal GO 1 of Go. Notice that Go also
has identical univariate marginals. If Go,! is nondegenerate then the results of
Chapter 5 already give some insight into the present problem.
Recall from Section 2.2 that the d-variate d.f.'s x -+ nt=1 Go.! (x;) and
x -+ GO,1 (min(x 1 , ... , x d )) represent the case of independence and complete
dependence.
Lemma 7.2.1. Assume that the univariate marginals F:,1 converge pointwise to
the dj. GO 1 '
(i) Then, for every x,
nG
d
i;;;;l
O,1 (Xi)
~ lim inf F:(x) ~ lim sup F:(x) ~ Go.! (min(x 1 , 00', Xd))'
n
233
~ exp [
=
-jt
+ 0(1)
n(1- Fn ,1(X)) ]
n exp[ -n(1 d
j=1
Fn ,1(X))]
+ 0(1) =
n G ,1(X) + 0(1).
d
j=1
F:
;=1
(7.2.1)
Let /; = (e 1" .. , ed) be a random vector with dJ. F. Recall from P.2.5 that
for some universal constant C > 0,
s~p
Fn(t) - exp
Ct
(7.2.2)
where
j
Corollary 7.2.2. Let /;n be a d-variate random vector with df Fn. Define hn.j in
analogy to hj in (7.2.3) with /; replaced by /;n. Suppose that the univariate
marginals F:,1 converge pointwise to a df Moreover, for every j = 1, ... , d,
n -+
00,
pointwise,
Go
= ex p (
J=1
(-1)ihO,j)
is a df,
and
(ii)
n -+
00,
for every x.
n -+
00,
234
-$(A +
x ;y)e-Y
$(A +
y ; x)e- x ]
(7.2.4)
with
Ho = lim H).
and
H~ =
).LO
lim H)..
).T~
+ nLix,y))l::;; Cn- l
(7.2.5)
where
Lix,y) = P{en > x, '1n > y}
F:.I(x) -+ GO.I(x),
(7.2.6)
00,
. .
+ 0(1).
(7.2.7)
n -+
00,
00.
(7.2.8)
235
PROOF. Notice that (bn + anx)j w(F) and n(1 - F(bn + anx)) --+ -log G(x),
n --+ 00, for cx(Go ) < x < w(Go ) and hence the assertion is immediate from
Lemma 7.2.3 applied to ~n = a;;1(~ - bn ) and '1n = a;;1('1 - bn ).
D
It is well known that (7.2.9) is also necessary for the asymptotic independence. Moreover, Corollary 7.2.4 can easily be extended to the d-variate case
(see Galambos (1987), page 301, and Resnick (1987), Proposition 5.27).
Next, Lemma 7.2.3 will be applied to prove that, for multivariate extremes,
the asymptotic pairwise independence of the marginal maxima implies asymptotic independence.
P{Mn ::s; x}
::s; exp ({
:::: exp ( -
(D
: : Ca
::s;
,=1
f n(1 -
Fn.1 (x;))
i~ n(1 -
Fn.1 (Xi)))
i=1
GO 1 (X;)) exp(
Go. 1 (X;) )
1S;i<js;d
nLn.i,ixi,X))
+ 0(1)
~.
nLn.ijXi,Xj ) )
1S;'<)S;d
+ 0(1)
+ 0(1)
where
Ln.ijXi,Xj )
It remains to prove that
exp (
1S;i<jS;d
nL n i,iXi' X j ) ) --+ 1,
n --+
00,
236
for every x with n~;l GO,l (x;) > 0, This, however, is obvious from the fact that
according to Lemma 7.2,3 the pairwise independence implies
n -+
00,
d.
indepencence of ~ 1,
... ,
~d
(7.2.10)
= F:,l(X)F:,l(y)exp[nL(x,y)] + O(n- 1 ).
(7.2.11)
From (7.2.11) we see that the term nLn(x, y) determines the rate at which
the independence of the marginal maxima is attained.
It is apparent from the proof of Theorem 7.2.5 that (7.2.11) can easily be
extended to the case d ~ 2.
Next (7.2.11) will be specialized to bivariate normal vectors. It was observed
by Sibuya (1960) that the marginal maxima of i.i.d. normal random vectors
are asymptotically independent. In the following example we shall calculate
the rate at which the marginal maxima become quadrant-independent.
7.2.7. Let F be the dJ. of a normal vector (~,,,) where ~ and"
are standard normal r.v.'s. Let p denote the covariance of ~ and" where
- 1 < p < 1. Put un(x) = bn + b;;l x where again bn = mp(bn). Then, for every
x,y,
EXAMPLE
Fn(un(x), un(y))
= I>n(un(x))I>n(u n(y))[1
(7.2.12)
O(n-(l-P)/(1+ P)(log nt P/(1+ P)] + O(n-l).
237
It is well known that the normal distribution N(PZ.l- p2) is the conditional
distribution of given" = z. Thus,
nL(un(x), un(y)) = n
f.oo
(1 - N(Pz.l-p2)( -
00, un(x)J)<p(z)dz
Un(Y)
00
= o (b;; 1 <p(bS1-P)/(1 +P
where the final step is carried out by using the inequality 1 - cD(x) ::;; <p(x)/x,
x> O. We remark that for p > 0 the integration over z with y::;;
(uix) - pu n(z))/(1 - p2)1/2 ::;; bn has to be dealt with separately. Since bn =
O((log n)1/2) the proof can easily be completed.
Final Remarks
If one confines the attention to asymptotically independent r.v.'s then it is
natural to replace, in a first step, the original marginal r.v.'s by some independent versions. The calculation of an upper bound of the Hellinger distance
between the distribution of a multivariate maximum and the joint distribution
of the independent versions ofthe marginals is an open problem. In a second
step one could apply Lemma 3.3.10 and the results of Section 5.2 to obtain
an upper bound of the Hellinger distance between the original distribution
and a limit distribution.
If we analyze the density of the normalized bivariate maximum, with
normalizing constants an > 0 and bn, in the form as given in (2.2.8), we
find that the decisive condition for the asymptotic independence, in the
strong sense, is that the conditional dJ.'s Fl(bn + anxlbn + anY) and
F2(bn + anYlbn + anx) converge to 1 as n ~ 00. Recall that the related condition (7.2.9) yields the asymptotic independence in the weak sense.
In case of asymptotic independence the statistical results in the univariate
case carryover to the multivariate case. If the marginals are asymptotically
dependent then new statistical problems have to be solved (see e.g. P.2.11 and
Bibliographical Notes).
M.....
1. Denote by
the number of random vectors ~i in the random quadrant
(-oo,X~~!) x (-oo,X~7!). Under the conditions of Theorem 7.1.1 the random
vectors (M.(n) .(.) . , X~:~),., X~7~),.) are asymptotically normal.
(Siddiqui, 1960)
238
2. (i) Let I;
that is, cov(g(I;),f(I; ;;:,,: 0 for all component wise nondecreasing, real-valued functions j, g, whenever the relevant expectations exist.
(Marshall and Olkin, 1983; for an extension see Resnick, 1987)
(ii) If C; 1, ... , C;d are associated and uncorrelated then they are mutually independent.
(Joag-Dev, 1983)
Bibliographical Notes
Under slightly stronger conditions than those stated in Theorems 7.1.1 and
7.1.2, Weiss (1964) proved the asymptotic normality ofthe dJ.'s of multivariate
central order statistics. The proof is based on the normal approximation of
the multinomial distribution.
The asymptotic normality of multivariate central order statistics was
already proved by Mood (1941), in the special case of sample medians, and
by Siddiqui (1960). In both articles the exact densities of the order statistics
are computed. By using the normal approximation of the multinomial distribution it is then shown that the densities converge pointwise to the normal
density. Thus, according to the Scheffe lemma, one also gets the convergence
in the variational distance. Kuan and Ali (1960) verified the joint asymptotic
normality of multivariate order statistics, including the case where several
order statistics are taken from each component. It is evident that such ordered
values define a grid in the Euclidean d-space. The frequencies of sample points
in the cells of the grid define further r.v.'s. Weiss (1982) proved the joint
asymptotic normality of multivariate order statistics and such associated cell
frequencies.
The research work on multivariate maxima of i.i.d. random vectors started
with the articles of J. Tiago de Oliveira (1958), J. Geffroy (1958/59), and
M. Sibuya (1960). In literature, further reference is given to Finkelstein (1953).
From the beginning much attention was focused on the case where the
marginal maxima are asymptotically independent. It was observed by S.M.
Berman (1961) that for the components of an extreme value vector the independence is equivalent to the pairwise independence. In this context one
also has to note that the marginal maxima are asymptotically, mutually
(quadrant-) independent when, and only when, this is true for each pair of
marginal maxima [see e.g. Galambos (1987, Corollary 5.3.1) or Resnick (1987,
Proposition 5.27)].
If measurements of a certain phenomenon are made at places close together
then there will be a certain dependence between the observations which, in
the present context, are supposed to be maxima. From the results of Section
2.2 it is apparent that the family of max-stable distributions is large enough
to serve as a model for this situation. One may argue that this model is even
Bibliographical Notes
239
PART III
STATISTICAL MODELS
AND PROCEDURES
CHAPTER 8
In this chapter
244
if the underlying model is large enough. The test, estimation, and confidence
procedures have to be randomized to satisfy the usual requirements in an exact
way (e.g. attainment of a level or median unbiasedness).
el'
en
rf (~)
i=O
qi(1 _ q)n-i
~ IX.
(8.1.1)
Notice that the left-hand side of (8.1.1) is equal to P{Xr:n > F-l(q)}.
Keep in mind that r(lX) also depends on q and n. Put XO:n = -00 and
Xn+l:n
= 00.
It is clear that
{Xr(Gr): n
F-l(q)
however, the level IX will not be attained on the null-hypothesis except in those
cases where equality holds in (8.1.1).
To define a test which is similar on {F: F-l(q) = u} we introduce a randomized test procedure based on two order statistics. Define the critical
function CfJ by
Xr(Gr):n
>u
I
CfJ = { Y(IX) if Xr(Gr):n ~ u, Xr(Gr)+l:n > u
(8.1.2)
o
Xr(Gr)+l:n ~ u
where Y(IX) == y(lX, q, n) is the unique solution of the equation
r(f
~o
(~) qi(l
_ q)n-i
+ y ( n ) qr(Gr)(l
~~
q)n-r(Gr)
= IX
(8.1.3)
with 0 ~ y < 1.
Simple calculations show that the left-hand side of (8.1.3) is equal to
EFCfJ = P{Xr(Gr):n
We have
if F-l(q) ~ u
F-l(q) = u.
(8.1.4)
245
Moreover,
EFCP =
IX
if F(u) = q.
>
iff 1 < c
.IJ fo(O
n
I?=l 1(-oo,u)(eJ
Ofo(O=
i=l
'
(q(1 _ qd)Sn ( 1 _ q )n
-q1(1-q)
l-q1
and hence
- q1
JI log q(1(1ql
qd)'
- q
>
X,(-) . n > u
Sn iff
~
<
X'(I1)+1:n :$; u
Corollary 8.1.2. The critical function CPn defined in (8.1.2) is uniformly most
powerful of level IX for testing F-1(q) :$; u against F-1(q) > u.
246
PROOF. Obvious from (8.1.4) and Lemma 8.1.1 since the dJ. Fo defined in
Lemma 8.1.1 is continuous.
0
(8.1.6)
The crucial point ofthe conditions above is that the derivatives above need
not be uniformly bounded over the given model.
Lemma 8.1.3. Let k = 1,2, ... or k = 00 be fixed.
Then, cp as defined in (8.1.2) is a uniformly most powerful critical function of
level rx for testing F-1(q) ~ u against F-1(q) > u with F E~.
PROOF. Notice that Fo (see the line before (8.1.5)) does not belong to
If f1 is the density of F1 E ~ then Fo has the density
q;
q
fo = f1 (
1(-oo,u]
1- q
+1
q1
1(u,oo)
~.
(8.1. 7)
applying Fatou's lemma, one can prove that every critical function t/I of level
rx on {F E~: F-1(q) ~ u} has the property EFot/l ~ rx. Thus, Lemma 8.1.1
yields E F 1 t/I ~ E F 1 cp and hence, cp is uniformly most powerful.
0
(8.1.8)
247
and
==
y(1/2, q, n)
= (1 - y(n))Xr(n):n + y(n)Xr(n)+l:n
(8.1.10)
+ (1 + q)/3 =
0.
(8.1.11)
248
Put
s~p Ip
t
(52
1/2 f(:-l
:$;
t}
1
)1 = o(n-l/2).
(8.1.12)
:$;
Thus, F.- l generates increasing step functions which have jumps at the
points i/n for i = 1, ... , n - 1. Throughout we define F.-l (0) = F.-l(O+) = Xl:.
and F.-l(l) = F.- 1 (1-) = X.: n.
If the underlying q.f. F- l is continuous or differentiable then it is desirable
to construct functions as estimates which share this property. Moreover, the
information that F- l is a smooth curve should be utilized to obtain estimators
of a better statistical performance than that of the sample q.f. F.-I. The key
idea will be to average over the order statistics close to the sample q-quantile
for every q E (0, 1).
The Polygon
In a first step we construct a piecewise linear version of the sample q.f. Fn- l
by means of linear interpolation. Thus, given a predetermined partition
0= qo < ql < ... < qk < qk+1 = 1 we get an estimator of the form
Fn- l (qj-l ) + q - %-1 [F-l()
n
% - F 1 ( %-1 )] ,
qj - qj-l
(8.2.1)
249
For j = 2, ... , k we may take values qj such that qj - qj-1 = f3 for some
appropriate "bandwidth" f3 > o. This estimator evaluated at q is equal to the
sample q-quantile if q = % and equal to [Fn- 1(q - f3/2) + Fn- 1(q + f3/2)]/2 if
q = (%-1 + %)/2 for j = 2, ... , k. Notice that the derivative of the polygon is
equal to
Moving Scheme
This gives reason to construct another estimator of F- 1 by using a "moving
scheme." For every q E (0, 1) define the estimator of F-1(q) by
(8.2.2)
where the "bandwidth function" f3(q) has to be defined in such a way that
q - f3(q) < q + f3(q) s 1. Given a predetermined value f3 E (0, 1/2) the
bandwidth function f3(q) can e.g. be defined by
os
f3(q)
f3
1- q
if
O<q<f3
f3sqs1-f3
1-f3<q<1.
(8.2.3)
q - q2/4f3
f3
(1 - q) - (1 - q)2/4f3
(8.2.4)
where it is assumed that f3 s 1/4. Notice that the bandwidth function in (8.2.4)
is differentiable.
The use of bandwidths depending on q can be justified by the following
arguments:
Since Fn- 1 (q) is the natural, nonparametric estimator of F- 1 (q) it is clear
that (8.2.2) defines an estimator of [F- 1(q - f3(q + r1(q + f3(q))]/2 which
in turn is approximately equal to F-1(q) if F- 1 is a smooth function near q
and if f3(q) is not too large. However, if q is close to one of the endpoints of
the domain of F- 1 , then one has to be cautious. If q or 1 - q is small than the
usual q.f.'s (e.g. normal or exponential) do not fulfill the required smoothness
condition. Thus, without further information about the form of the qJ. at the
endpoints of (0, 1) a statistician should again adopt the sample qJ. or any
estimator close to the sample qJ. This aim is achieved by using bandwidths
as defined above.
The use of variable bandwidths also enters the scene when a pointwise
optimal bandwidth (depending on the underlying dJ.) is estimated from
the data. In this case the bandwidth is random and depends on the given
argument q.
250
The polygon (Figure 8.2.1) and the moving scheme (Figure 8.2.2) are
based on n = 50 pseudo standard exponential random numbers. F- 1 is the
standard exponential q.f.
1.2
0.8
0.4
0.4
0.5
0.6
P=
0.1.
1.2
0.8
0.4
0.5
0.4
0.6
P= 0.1.
+ X s (q):n)/2.
(8.2.5)
251
If q - P(q) and q + P(q) are not integers then we have r(q) = max(1, [n(q P(q))] + 1) and s(q) = min(n, [n(q + P(q))] + 1).
Another ad hoc estimator of the q-quantile is a certain "trimmed mean"
defined by
(s(q) - r(q)
s(q)
+ 1)-1 L
i=r(q)
X i :n.
(8.2.6)
Fn~Mq) =
L ai.n(q)Xi :n
i=1
(8.2.7)
L ai.n(q) = 1.
i=1
(8.2.8)
Within this class of estimators we shall study those where the scores are
defined by a kernel. The "trimmed mean" will be closely related to a kernel
estimator which is based on a uniform kernel.
= 1.
(8.2.9)
k(x,y)Hn(y)dy.
(8.2.10)
k(x, y) dy
K(x,y)dHn(y)
+ Hn(a+)
(8.2.11)
if Hn(a+) and Hn(b-) exist and are finite where the function K is defined by
K(x,z) =
k(x, y) dy.
(8.2.12)
252
(X-y)
P(x) .
(8.2.13)
(8.2.14)
where the score functions ai,n are given by
i/n
ai,n(q) =
k(q, y) dy.
(i-l)/n
Obviously, condition (8.2.9) implies that the scores ai,n(q) satisfy condition
(8.2.8).
Let u have the properties Su(y)dy = 1 and u(x) = 0 for Ixl > 1. Moreover, assume that the bandwidth function p satisfies the condition P(q) :::;
min(q, 1 - q); e.g. the bandwidth functions in (8.2.3) and (8.2.4) satisfy this
condition. Then the kernel k defined in (8.2.13) satisfies (8.2.9). Now, Fn~6 can
be written in the form
(8.2.15)
For P(q), defined in (8.2.3) and (8.2.4), the function q -+ q - P(q)y is nondecreasing for every Iy I :::; 1 showing that Fn~6 is nondecreasing if u ~ O. Thus,
Fn~6 is in fact a qJ. Moreover, this construction has the favorable property
that the range of Fn~6 is a subset of the support of the underlying dJ. F.
Writing U(z) = S=-1 u(y)dy we have
-1
Fn,o(q) = i~
(8.2.16)
It is easy to verify that the coefficients are equal to zero if i :::; n(q - P(q)) or
i ~ n(q + P(q)) + 1.
253
or, alternatively,
Fn,o(x) = n- 1
(8.2.18)
U((x - ~i)/[3)
i=l
Density Estimation
The kernel method enables us to construct differentiable functions as estimates
of the dJ. F and the qJ. F- 1 , although the initial estimates are step functions.
Thus, we get estimators of the density f = F' and the density quantile function
(F- 1 ), = 1/f(F- 1 ) as well.
From (8.2.18) we obtain
Fn,l (x) = F~,o(x) = (nf3)-l
L u((x -
~i)j[3).
(8.2.19)
i=1
~i/n) JXi:n
(8.2.20)
for [3 ::;; q ::;; 1 - [3. With [3(q) as in (8.2.4) the same representation of Fn~~ holds
for 2[3 ::;; q ::;; 1 - 2[3. However, now Fn~~ also exists on (0,1) and can easily be
computed.
Some Illustrations
In this sequel, we shall apply the Epanechnikov kernel defined by
u(x) = (3/4)(1 - x 2)l r- 1,ll(x).
(8.2.21)
Notice that
o
U(x)=1/2+3x/4-x 3 /4
1
x < -1
if -1::;;x::;;1
x>1.
In Figures 8.2.3 and 8.2.4 the kernel qJ. Fn~6 and the qJ. (Fn,or 1 ofthe kernel
dJ. Fn,o are based on n = 100 pseudo standard exponential random numbers.
For q bounded away from 0 and lone realizes that Fn~6 and (Fn,O)-l have
about the same performance. Near to 0 and 1 the estimate taken from (Fn,O)-l
254
has the unpleasant property that (a) it is inaccurate and (b) it attains values
which do not belong to the support of the exponential dJ. The second property
is of course not very surprising. To avoid this unpleasant behavior of (Fn,o)-l
one should modify Fn,o(x) in such a way that the bandwidth depends on x.
0.1
0.2
l ,
0.08.
= 100, fJ = 0.08.
255
Figures 8.2.3 and 8.2.4 show clearly that the kernel estimates reduce the
random fluctuation of the "natural" estimates thus, also reducing the maximum deviation from the underlying dJ.
Next Fn~6 and (Fn,O)-l will be evaluated at the right end of the domain.
Again Fn~6 is defined with the bandwidth function in (8.2.4).
8
0.9
Figure 8.2.5.
l,
1.0
f3
= 0.08.
0.9
1.0
At the first moment I thought there was an error in the computer program
when the graph in Figure 8.2.6 appeared on the screen. The graph of (Fn,O)-l
can hardly be distinguished from the sample qJ. The explanation for (Fn,of 1
256
being close to the sample qJ. is that the largest order statistics are not close
to each other, and so the kernel dJ. with the bandwidth f3 = 0.08 does not
smooth the sample dJ.
0.8
1.0
0.6
257
O.B
1.0
100, f3
0.08.
258
0.2
....
24
28
32
36
Figure 8.2.9. July: Kernel density and Weibull density with parameters J1 = 37.3,
= 8.5, ex = 2.7.
(J
0.2
.........
24
28
32
36
Figure 8.2.10. September: Kernel density and Weibull density with parameters
(J = 19.8, ex = 7.0.
J1 = 44.0,
bandwidth {3 = 2.0. A better fit can be achieved by more smoothing, that is,
for a larger bandwidth.
We see that the densities of the maxima of temperature in July (Fig. 8.2.9)
are skewed to the left; those for September (Fig. 8.2.10) are skewed to the right.
Below we also include the corresponding Weibull densities for June and
August which are close together. That for June is nearly symmetric and that
for August is slightly skewed to the right. The kernel density for annual
maxima is nearly symmetric.
The largest observed values of monthly maxima within 133 years are
259
0.2
24
28
32
36
Figure 8.2.11. Kernel density for annual maxima; Weibull densities: June: J1 = 38.2,
= 10.6, (J. = 3.9; July: J1 = 37.3, (J = 8.5, (J. = 2.7; August: J1 = 39.8, (J = 12.2, (J. = 4.1;
September: J1 = 44.0, (J = 19.8, (J. = 7.0.
(J
(a) 36.8 in June 1947, (b) 35.6 in July 1911, (c) 35.8 in August 1857, and (d) 34.2
in September 1949.
We suggest to classify the annual maximum as a maximum of independent,
not identically distributed Wei bull r.v.'s according to the maxima in June,
July, August, and September. According to (1.3.4), the calculation of the dJ.
and the density of the maximum of not identically distributed r.v.'s creates no
difficulties. The resulting density shows an excellent fit to the kernel density
of the annual maxima as given in Figure 8.2.11.
The choice of the Wei bull density was accomplished by some visual,
subjective judgment. To obtain an automatic procedure one should fix a
distance between densities like the maximum deviation, X2-distance, Hellinger
distance, or some other distance. Then, take that parameter (}1, a, (J() which
minimizes the distance between the kernel density and the Weibull density.
From the foregoing remarks it becomes obvious that our estimates are produced by some kind of minimum distance method. By using this method we
are getting larger estimates of the unknown right endpoint than by taking the
sample maximum. Recall that the "minimum distance" estimates are 38.2, 37.3,
39.8, and 44.0 compared to the sample maxima 36.8, 35.6, 35.8, and 34.2. The
difference is particularly significant in those cases where the density is skewed
to the right.
Hosking (1985) developed a modified Newton-Raphson iteration algorithm for solving the maximum likelihood equation in the 3-parameter extreme value model (given by the von Mises parametrization). This algorithm
seems to work if IPI < 0.5. When using the "minimum distance" estimates,
given in Figure 8.2.11, as initial estimates, then one obtains the following
estimates:
260
June:
August:
}J.
}J.
= 39.1,
= 38.7,
(J
(J
= 11.4, a = 4.3;
= 11.1, a = 4.0;
July:
}J. = 36.2, (J = 7.4, a = 2.4;
September:}J. = 38.8, (J = 14.4, a = 5.2.
Joe
(q-y)F-1()d
P n y Y
(8.3.1)
if 2P < q < 1 - 2p. Notice that under appropriate regularity conditions the
ith derivative Fn~f of Fn~& is given by
(8.3.2)
Moderate Deviations
Our first aim will be to deduce rough bounds for the rate of convergence of
kernel estimators of the q.f. and its derivatives. For this purpose we shall study
again the oscillation property of the sample q.f.
The basic tool for the following considerations will be Lemma 3.1.7(ii)
which describes the stochastic behavior of
(8.3.3)
uniformly over ql' q2 with 0 < Pl ::;; ql < q2 ::;; P2 < 1.
In the sequel we shall assume that the kernel u satisfies the following
regularity conditions:
u(i)(y)yi dy
and
= i!,
f u(i)(y)yi dy = 0,
= 0, .'" m + 1
(8.3.4)
O::;;j<i::;;m+1.
261
f u(y)yi dy = 0,
j = 1, ... , k.
f u(i)(y)yi+i dy = 0,
(8.3.5)
f [pu(i)(y)
1 (q
}-l
dy
].
(8.3.7)
if again 2f3 < q < 1 - 2f3 and if the derivatives of p-l at q exist.
We remark that (8.3.7) always holds for k = 0.
The representation above shows that Ri,iq) splits up (a) into a random
part which is governed by the oscillation behavior of the sample qJ. and (b)
into a non-random part which depends on the remainder term of a Taylor
expansion of p-l about q.
It is evident that a similar representation holds for the sample dJ. Pn in
place of the sample qJ. Pn- 1 . Recall that the oscillation behavior of Pn was
studied in Remark 6.3.3.
The histograms with random or non-random cells are based on terms of
the form
or
(Fn(t z ) - Pn(td)/(t z - td
Thus, the oscillation behavior of the sample qJ. and the sample dJ. can be
regarded as a property which summarizes the properties of histograms.
The representation (8.3.6) shows that the stochastic behavior of kernel
estimators of the qJ. is exhaustively determined by the oscillation behavior of
the sample q.f.
262
p{
sup
PI~q~P2
(ii) P {
sup
PI
~q~P2
Wn~Mq) Wn~l(q)
+ pmJ} <
Bn- s,
)1/2 + pm-.+l
. J} < Bn- s
.
logn
- (F-l )(')(q)1
> C [(2i=1
(iii) P {
PROOF.
sup
Pl~q~P2
It is easy to see that Lemma 8.3.3(i) holds with (log n)/n in place of
(P(log n)/n) if 0 < P ::5: (log n)/n. This yields that for every e > 0:
1/2
P{
sup
Inl/2(Fn~Mq) -
~ e} ~ 0
(8.3.8)
Pl~q~P2
263
(pk+i A Jlu(x)xk+ildx/(k
+ 2(p/n)F'(t)
XU(X)U(X)dXI
+ 1)!)2 + O(p2/n).
(8.3.9)
This result enables us to compare the mean square error E(Fn,o(t) - F(t2
of Fn,o(t) and the variance E(Fn(t) - F(t2 = F(t)(1 - F(t/n ofthe sample dJ.
Fn(t) evaluated at t. If F'(t) > 0 and the bandwidth p is chosen so that the
right-hand side of (8.3.9) is sufficiently small then the term Jxu(x) U(x) dx can
be taken as a measure of performance of Fn,o(t). If
(8.3.10)
xu(x) U(x) dx =
since the integrand on the right-hand is non-negative. Notice that a nonnegative kernel u satisfies Condition 8.3.2 only if k = 1.
From (8.3.9) we see that Fn,o(t) and Fn(t) have the same asymptotic efficiency, however, Fn(t) is asymptotically deficient w.r.t. Fn,o(t). The concept of
deficiency was introduced by Hodges and Lehmann (1970). Define
(8.3.11)
Thus, i(n) is the smallest integer m such that Fm(t) has the same or a better
performance than Fn,o(t). Since i(n}/n - 1, n - 00, we know that Fn,o(t) and
Fn(t) have the same asymptotic efficiency. However, the relative deficiency
i(n) - n of Fn(t) w.r.t. Fn,o(t) quickly tends to infinity as n - 00. In short, we
may say that the relative deficiency i(n) - n is the number of observations that
are wasted if we use the sample d.f. instead of the kernel estimator.
The comparison of Fn(t) and Fn,o(t) may as well be based on covering
probabilities. The Berry-Esseen theorem yields
(8.3.12)
where u 2 = F(t)(1 - F(t. The Berry-Esseen theorem, Theorem 8.3.4, and
P.8.6 lead to the following theorem.
p >0,
,
= 2<1> [y (~ - E(Fn,o(t) - F(t2) ]
2 2E(Fn(t) - F(t2
we
(8.3.13)
+ O(n-i/2 + (P + np2(m+i3/2).
264
We see that the performance of Fn,o(t) again depends on the mean square
error. A modified definition of the relative deficiency, given w.r.t. covering
probabilities, leads to the same conclusion as in the case of the mean square
error.
In analogy to the results above, one may compare the performance of the
sample q-quantile Fn-l(q) and a kernel estimator Fn~Mq). If the comparison is
based on the mean square error, one has to impose appropriate moment
conditions. To avoid this, we restrict our attention to covering probabilities.
Recall from Section 4.2 that under weak regularity conditions,
Lemma 8.3.6. Let Fn~A be the kernel estimator of the qf. as given in (8.3.1).
Suppose that the kernel u satisfies Conditions 8.3.1 (i), (ii). Suppose that the qf.
F- 1 has a bounded second derivative on a neighborhood of the fixed point
q E (0, 1), and that f(F-1(q > 0.
Then, if P== pen) -+ 0, n -+ 00, we have,
P{ (nl/2 /O'n)(Fn~Mq) - fln) ::;; y} = <l>(y) + O(log(n)n- 1/4 )
(8.3.15)
where
(8.3.16)
and
0'; =
II (f
dy.
(8.3.17)
Moreover,
n -+
00.
(8.3.18)
20'5/n
(8.3.19)
265
This shows that the performance of Fn~Mq) depends on the "mean square
error" In + (J-In - F-l(q))2. As in Falk (1985a, proof of Theorem 2.3) we may
prove that
a;
a;
= aJ
2f3(n)
XU (x) U(x) dx
+ O(f3(n)2)
(8.3.20)
and
IJ-In - F-1(q)1
o(f3(n)k+ 1 )
(8.3.21)
Fn~A
Fn~A(q) =
Il nf3~q)
U(
qf3~;) Fn-
1 (y) dy
0, u(x)
(8.4.1)
0 for Ixi > 1, and
(8.2.4). Denote by Fn,o the smooth sample dJ. which is defined as the inverse
of the kernel qJ. Fn~A.
By plugging Fn,o into T,,(', t) (instead of Fn) we get the smooth bootstrap
dJ. T,,(Fn,o, ').
266
We remark that one may also use the kernel estimator of the dJ. as
introduced in Section 8.2.
Since Fn o is absolutely continuous one can expect that the smooth bootstrap dJ. T,,(Fn.O, .) is also absolutely continuous. This will be illustrated in
the particular case of the q-quantile.
Illustration
Given n i.i.d. random variables with standard normal dJ. <I> define again, as
in Section 6.4, the normalized dJ. of the sample q-quantile by
T,,*(F, t) = T,,(F, (q(l - q1/2t/nl/2qJ(<I>-1(q))).
-3
-2
-1
-3
-2
-1
267
compare the normalized dJ. T,,*(F, .) ofthe sample q-quantile, the normalized
bootstrap dJ. T,,*(Fn, .), and the normalized smooth bootstrap dJ. T,,*(Fn,o, .).
The kernel qJ. Fn~~ is defined with the bandwidth function in (8.2.4) with
P = 0.07. Moreover, u is the Epanechnikov kernel.
B}
(8.4.2)
= T,,(Fn,o, B) - T,,(F,B).
(8.4.3)
Notice that Iln(F, .) is the difference of two random probability measures and
thus a random signed measure. Below we shall study the stochastic behavior
of Iln(F, .) as n --+ 00 in the particular case of the q-quantile T(F) = F-1(q).
Let!7 be a system of Borel sets. We shall study the asymptotic behavior of
sup Illn(F, B)I
Be[/'
in the particular case ofthe functional T(F) = F-l(q) for some fixed q E (0, 1).
Put
and
vn(F, B) =
(8.4.4)
[1 - (x/O'n)]2 dN(O,aa)(x),
= (2ne)-1/2
and
(8.4.5)
sup Ivn(F, [-t,t]1 = sup Ivn(F,B)1 = (2/ne)1/2.
1>0
--+ 00
as n --+
00,
268
Then,
PF {
nf3n /
f y2 !~~
u 2 (y) dy
~ t}
--+
2<I>(t) - 1
(8.4.6)
as n --+
00
The key idea of the proof is to compute the asymptotic normality of the
sample q-quantile ofi.i.d. random variables with common qJ. Fn~6. According
to Lemma 8.3.3 such qJ.'s satisfy the required smoothness conditions with
high probability.
A version of Theorem 8.4.1, with Y' = {( - 00, tJ} and Fn,o being the smooth
sample dJ., is proved in Falk and Reiss (1989). A detailed proof of the present
result will be given somewhere else.
If f3n = n- 1/ 3 then the accuracy of the bootstrap approximation is, roughly
speaking, of order O(n- 1/3 ). The choice of f3n = n- 1/2 leads to a bootstrap
estimator related to that of Section 6.4 as far as the rate of convergence is
concerned.
Under stronger regularity conditions it is possible to construct bootstrap
estimates of a higher accuracy. Assume that F- 1 has three bounded derivatives
near q and that the kernel u has three bounded derivatives. Moreover, assume
that Su(x)x dx = O. Notice that nonnegative, symmetrical kernels u satisfy
this condition. Then, the condition nf3; --+ 0 in Theorem 8.4.1 can be weakened
to nf3; --+ 0 as n --+ 00. This yields that the rate of convergence of the smooth
bootstrap dJ. is, roughly speaking, of order O(n- 2 / 5 ) for an appropriate choice
of f3n.
qrkp(r) = 1/2.
E ~
and P-I(q)
<
u.
>
269
3. Let cp and ~ be as in Lemma 8.1.3 and let ':# be a sub-family of~. For e > 0 define
a "e-neighborhood" ':#, of,:# by
':#, = {F
E~:
G E ':#}
if FE ':#,/2 and q -
4(1
+ e)
13 = n- 1/ [f(X)
u 2(y) dy
For this choice of 13, the mean square error of fn(x) satisfies the relation
E[fn(x) - f(X)]2
5
= n- 4 / 5 4
[
f(x)f u 2(y) dy J4/5 [ f(2)(x) f U(y)y2 dy J2/5 + 0(n-
= 21/2 cos(2n:jx)
e 2j(x) =
21/2
sin(2n:jx),
= 1,2,3, ....
(b) Let
f(x)g(x) dx].
4 / 5 ).
270
= 1+
it (n- i~ ei~J) ei
1
x)
Var(in(x))dx
n- 1
it I
er(x)f(x)dx - n- 1
(1 + it
ar)
= O(s/n)
(/,.(x) - 1)2 dx
as a test statistic for testing the uniform distribution on (0,1) against alternatives
as given iq (i) (b) with s == s(n) -> 00 as n -> 00.
(Compare with Example 10.4.1.)
6. There exists a constant C(p) > 0, only depending on p > 0, such that
IN(~n,y~){Jl-ayn-1/2, ~ + ayn-l/2)_ 2<D(y[1
0,
Vn
> 0, a
ai, n(~n p, -
00
~)2)}/2])-11
~)2))3/2
< /-In,
<
00
7. Denote by G;;l the sample qJ. if Fn o is the underlying dJ. Prove that
PFn.o{(G;;l(q) - Fn~b(q))/Fn~l(q) E B}
Fn~6
and the
Bibliographical Notes
It was proved by Pfanzagl (1975) that the sample q-quantile (including the
sample median) is an asymptotically efficient estimator of the q-quantile (the
median) in the class of an asymptotically median unbiased estimators. It is
well known that for symmetric densities one can find nonparametric estimators of the symmetry point which are as efficient as parametric estimators;
according to Pfanzagl's result a corresponding procedure is not possible if
there is even the slightest violation of the symmetry condition.
Bibliographical Notes
271
CHAPTER 9
273
... ,
~k
e:
(9.1.2)
274
(9.1.4)
where the inf is taken over the given class of estimators (){*. Notice that (9.1.4)
can be modified to a local minimax criterion by taking the sup over a
neighborhood of ()o for each ()o E e.
The Bayes risk of an estimator (){* w.r.t. a "prior distribution A" is given
by the weighted risk
f E8 L ((){* ())dA(())
1
(9.1.5)
where the inf is taken over the given class of estimators (){*. In certain
applications one also considers generalized Bayes estimators where A is a
measure; this generalization e.g. leads to Pitman estimators (compare with
(10.1.23)). For a detailed treatment of Bayes and minimax procedures we refer
to Ibragimov and Has'minskii (1981) and Witting (1985).
Alternatively, one can try to find an optimal estimator within a class of
estimators which satisfy an additional regularity condition. Recall that if the
estimators are assumed to be expectation unbiased then the use of(9.1.1) leads
to the famous Cramer-Rao bound as a lower bound for the variance. In the
nonparametric context (e.g. when estimating a density) one has to admit a
certain amount of bias of the estimator to gain a smaller mean square error.
The extension of the concept above to randomized estimators (Markov
kernels having their distributions on the parameter space e) is straightforward. Notice that E8L((){ 1()) = L( '1 ())dQ8 where Q8 is the distribution of
(){. The extension is easily obtained by putting the distribution of the randomized estimator in place of Q8'
A different restriction is obtained by the requirement that the estimator
(){ is median unbiased or asymptotically median unbiased (compare with
Section 8.1).
Moreover, we shall base our calculations on covering probabilities of the
form
275
P{ -t'::;;
e: - e::;; til}
(9.1.6)
will be replaced by
(9.1.8)
8 E 8.
(9.1.9)
= {Pe,h: 8 E 8, hE H(8)}.
B} - P{'1
(9.1.10)
B}I::;; e(e,h).
(9.1.11)
8(T(~)),
(9.1.12)
276
For every procedure acting on fl, we found a procedure on f!jJ with the same
performance (within a certain error bound). Until now we have not excluded
the possibility that there exists a procedure on f!jJ which is superior to those
carried over from fl to f?J. However, if T is a one-to-one map (as e.g. in Example
9.1.1), one may interchange the role of fl and f?J by taking the inverse T- 1
instead of T. Thus, the optimal procedure on f!jJ can be regained from the
corresponding one on fl.
In connection with loss functions the parameter () is not necessarily
real-valued. The extension of the concept to functional parameters is obvious.
EXAMPLE 9.1.1. Section 9.2 will provide a simple example for the comparison
of two models. Here, with () = (CT, ct), Po is the Frechet distribution with scale
parameter CT and shape parameter 1/ct, and Qo is the Gumbel distribution with
location parameter log CT and scale parameter ct. The transformation T is given
by T = log.
Moreover, given a sample of size k one has to take the transformation
T(x 1 ,,X k )
= (logxl,,logxd
(9.2.1 )
277
distributions. The following two formulas are well known (see e.g. Johnson
and Kotz (1970:
xdG 3(x) =
(9.2.2)
dx = y
x Z dG 3 (x) = yZ
+ n Z/6.
(9.2.3)
From (9.2.2) and (9.2.3) it is obvious that a r.v. IJ with dJ. G~8.a) has the
expectation
EIJ = ()
+ ay
(9.2.4)
and variance
(9.2.5)
The Fisher information matrix can be written as
I(()l,()z)
1/
By partial integration one can easily deduce from (9.2.4) and (9.2.5) that
I((),a)
-z
[1
(y -1)
nZ/6
(y -
+ (1
1)_ y)Z ].
(9.2.6)
(1
y)}
(9.2.7)
--+
<I>(t),
n --+ 00,
(9.2.8)
278
Le
k
=k
-(x, -O)/a
(9.2.9)
i=1
and
(Xi -
(J)
[1 -
e-(x, -OJ/a]
= k(J..
(9.2.10)
i=l
[k- . e-
,=1
(J
(9.2.11 )
Xda ]
g((J.) = 0
(9.2.12)
with 9 defined by
(9.2.13)
Observe that the solution ak(x l ' ... , x k) of the equation (9.2.12) has the following property: For reals (J and (J. > 0 we have
(9.2.14)
This property yields that there exist correction terms which make the m.l.
estimator of (J. median unbiased. The corresponding result also holds w.r.t. the
expectation unbiasedness.
Equation (9.2.12) has to be solved numerically; however, this can hardly be
regarded as a serious drawback in the computer era. Approximate solutions
can be obtained by the Newton-Raphson iteration procedure. Notice that
(6 1/ 2 /n)Sk(1J 1, ... , IJk)
k- 1
i~
xiJ
(9.2.16)
279
Efficient Estimation of (X
Let us concentrate on estimating the parameter rx.
(9.2.14) yields that (9.2.8) holds uniformly over the location and scale
parameters () and rx. A further consequence is that the m.l. estimator is
asymptotically efficient in the class of all estimators rxt (11 1 , ... , 11k) which are
asymptotically median unbiased in a locally uniform way. For such estimators
we get for every t', t" > 0,
P{ _t'k- 1/2 ~ rxt(111, ... ,l1k) - rx ~ t"k- 1/2 }
~ P{ _t'k- 1/2 ~ rX k(111, ... ,l1k) -
(9.2.17)
We return to the Frechet model of dJ.'s Gi~i/; with scale parameter (J and
shape parameter l/rx. The results above can easily be made applicable to the
Frechet model.
If ~ l ' ... , ~k are i.i.d. random variables with common dJ. Gi~i/; then it
follows from (9.2.8) and the discussion in Section 9.1 that
n .......
00.
(9.2.18)
280
the condition
f(x) = (CTocf 1(x/or(1+1 /")e h (X/C7)
where
Xo
for x
(xoor"
(9.3.1)
Llxl- 61"
(9.3.2)
uniformly over k, m and densities f which satisfy (9.3.1) for some fixed values
x o, Land fJ. Notice that CTm"~i has the dJ. G~~ii"m').
Let again <ik be the solution of the m.l. equation (9.2.12). Combining (9.2.18)
and (9.3.2) we get
Theorem 9.3.1.
P{(k/V(oc))1/2[<ik(logX~1'>m"" ,10gX~~m) - oc] ::;; t}
(9.3.3)
uniformly over t, k, m and densities f which satisfy condition (9.3.1) for some
fixed constants x o, L, and fJ. Moreover, V(oc) = 6 oc 2/n 2.
The properties of the m.l. estimator carryover from the parametric to the
non parametric framework.
281
[N(s)/kJ.
(9.3.4)
Denote by XXl:M the maximum claim size ofthe r.v.'s 'l(i-1)M+1'' 'liM. Thus,
using the notation of (1.1.4) we get the representation
(i)
(9.3.5)
<l>(t) +
O[k1(2[m~o (m-
+ m- 1)P{M = m} ] + k- 1(2 ]
(9.3.6)
uniformly over t, k, m and densities f which satisfy condition (9.3.1) for some
fixed constants x o, L, and b. Moreover, V(a) = 6 a 2jn 2.
m=O
L
00
m=O
P{&k(logX~1'>m,,logX~~m) ~ t}P{M = m}
with X~;m as in Theorem 9.3.1. Now the assertion is immediate since Theorem
9.3.1 holds uniformly over m.
0
If the distribution of M is highly concentrated about a fixed value, say m,
then it is apparent that the right-hand side of (9.3.6) is again that of (9.3.3).
Another interesting problem arises if k periods of length ti - t i- 1 are fixed.
Notice that the claim numbers N(td, N(t2) - N(t3)' ... , N(t k) - N(t k- 1) of
the k periods are independent. Again the statistical inference can be based on
the maximum claim sizes of each period. After conditioning on the claim
numbers the maximum claim sizes can again approximately be represented
by independent Gumbel r.v.'s which, however, are not identically distributed.
282
(1
> 0, ex> O}
(9.4.1)
with location parameter 0 and scale parameter (1. This model arises out of the
Frechet distributions G1,1/a' The model in (9.4.1) can be transformed to the
model
{Q:-1 x Gi~i7~,k:
(1
> 0, ex > O}
(9.4.2)
(9.4.3)
Exponential Model
The statistical inference is particularly simple in the exponential model
(9.4.4)
Asymptotically, one does not lose information by restricting model (9.4.2)
to model (9.4.4) as far as the evaluation of the parameter ex is concerned
(proofl).
The m.l. estimator
rXk - 1(111, ... ,11k-d
= (k _1)-1
k-1
L 11;
;=1
(9.4.5)
(9.4.6)
283
J(rx)
= f[:rxIOg[exp(-x/rx)/rxJ
dQ",(x)
= rx- 2 ,
(9.4.7)
thus, rlk(1J1' ... , IJk) attains the Cramer-Rao bound (kJ(rx)tl. The central limit
theorem yields the asymptotic normality of rlk(1J 1, ... , IJk). We have
(9.4.8)
+ 0(1).
(9.4.9)
as introduced in Section 9.4. Let fbe a density which satisfies condition (9.3.1),
that is,
f(x)
where
Xo
for x
(xo(Jt'"
(9.5.1 )
+ kin~
(9.5.2)
uniformly over n, k E {I, ... , n} and densities f which satisfy (9.5.1) for some
fixed constants 15, L, and xo.
284
{(I
}
o gXn:n
- - , ... ,( k - 1) og X n- k + 2 : n, X n - k + 1 : n)EB
X n- 1:n
X n- k +1:n
-
(9.5.3)
The optimal estimator in the exponential model {Q~-1: IX > O} with unknown scale parameter IX (compare with Section 9.4) is the m.l. estimator
&'k(I11,, rlk) = (k - 1)-1 L~':-; '1i where '11' ... , '1k are i.i.d. random variables
with common distribution Qa' Thus, within the error bound given in (9.5.3)
the estimator
k-1
lXt,n = (k - 1)-1
i 10g(Xn- i +1 :nlX n- i : n)
i=l
(9.5.4)
= [(k -
1)-1
:t:
has the same performance as the m.l. estimator &'k('11' ... , '1k) as far as covering
probabilities are concerned. We remark that IXt.n is Hill's (1975) estimator. The
optimality property carries over from &'k('11"'" '1d to IXL. From (9.5.3) we get
for t', t" > 0,
for x
(xoO'fa
(9.6.1)
Let n = mk. Given the i.i.d. random variables el' ... , en with common density
let x:.f!m be the maximum based on the jth subsample of r.v.'s e(j-1)m+1' .. ,
ejm for j = 1, ... , k. Moreover, X n-k+1:n, ... , Xn:n are the k largest order
285
statistics of ~I'
.. , ~n'
We write
(9.6.2)
where elk is the solution of (9.2.12). From (9.3.3) we know that for every t,
(9.6.3)
Recall from (9.5.6) that Hill's estimator octn, which is based on the k largest
order statistics, has the following property:
<I>(t)
(9.6.4)
for every t.
A comparison of (9.6.3) and (9.6.4) shows that the asymptotic relative
efficiency of Hill's estimator octn w.r.t. the estimator cXk,n' based on the sample
maxima of subsamples, is given by
(9.6.5)
k = cn 2d/(2HI)
(9.6.6)
286
+ h(x/(J)),
where
and 0 < p :::;; b :::;; 1. According to the results of Section 5.2, the expansion of
length 2 of the form
G1,1/a(X/(J)
(1 + m- 1~ p (x/(J)-(1+ p)a )
P
(9.6.8)
287
x> a,
(9.7.1)
O<q<1.
(9.7.2)
= a(l - qr a,
Fn~Mq)
-1
( 1 - q )-a~
Fn o(x o ) -1-.
-xc
q ~ xo
if
xo < q
(9.7.3)
where Fn~A is the kernel q.f. as defined in Section 8.2 and IX: is the Hill estimator
defined in (9.5.4).
In Figures 9.7.1 and 9.7.2, n = 100 pseudo-random numbers were drawn
according to the standard Frechet dJ. Gl,l' The point xo was chosen to be
equal to 0.9; the estimate of IX is equal to 1.012.
In Figure 9.7.1 the inverse (Fn.O)-l ofthe kernel estimator ofthe dJ. cannot
visually be distinguished from the sample qJ. Fn- 1 (compare this with the
remarks to Figure 8.2.6).
As indicated above, the philosophy behind this procedure is the following:
Up to some point xo we only have information that the underlying qJ. is
smooth, thus, the kernel method is applicable. Beyond the point xo we are in
the realm of extreme value theory, and hence, the use of a Pareto tail with
estimated parameters may be appropriate. The choice of the point Xo is crucial.
There seems to be some relationship to the well-known problem of estimating
288
0.92
0.96
F.~b, (F. of!
1.00
with f3 = 0.08.
289
e: = m
k -
')IIX:
and
IX:
= (6 1/2 /n)sk
+ y2(P2
290
Bibliographical Notes
In this book we primarily explore the distributional properties of order
statistics and relations between models of actual distributions of order statistics and approximate, parametric models. Statistical procedures are studied
as examples to show in which way parametric statistical procedures become
relevant within the nonparametric context.
A proper place for an exhaustive list of a greater number of parametric
statistical procedures in extreme value models is a book like that of Johnson
and Kotz (1970): Chapters 18,20, and 21 deal with exponential, Weibull, and
Gumbel models. We will only give a summary by using keywords out of these
chapters: Maximum likelihood (m.l.), minimum variance unbiased, Bayesian,
censoring, quick estimators, method of moments, best linear unbiased.
One might add (compare with Herbach (1984) and Mann (1984 the
additional keywords: Best linear invariant, unbiased nearly best linear, simplified linear.
Further references may be found in the following articles. Smith (1985a)
studied the asymptotic behavior of m.l. estimators in nonregular models like
the Weibull model; see also Polfeldt (1970). By the way, see Reiss (1973, 1978b)
and Pitman (1979) for consistency results concerning m.l. estimators in models
of unimodal d.f.'s and, respectively, in models with location and scale parameters. New quick estimators of location and scale parameters in the Gumbel
model have been proposed by Husler and Schupbach (1986). Quick tests and
a locally most powerful test have been studied by van Montfort and Gomes
(1985) for testing Gumbel d.f.'s against Frechet and Weibull alternatives.
In Section 9.2 we mentioned the asymptotic normality of the m.!. estimator
of the location and scale parameter in the Gumbel model. Higher order
approximations of the distribution of the m.l. estimator can be obtained by
means of expansions. These expansions may e.g. be applied to establish
asymptotic median unbiasedness of a higher order. We refer to R. Michel
(1975) for expansions in the case of vector parameters and to Miebach
(1977) for a specialization of these results to families with location and scale
parameters.
Next, we make some further comments about the estimation of the tail
index and related problems. The statistical extreme value theory is based on
the idea that the parametric extreme value model is an approximation of the
model of actual distributions of maxima. This idea was made rigorous by
Weiss (1971) in a particular case by treating a model of densities in a neighborhood of Wei bull densities. Weiss constructed quick estimators of the location,
scale, and shape (== tail index) parameter based on extreme and intermediate
order statistics. The estimator of the tail index is based on two intermediate
order statistics. This is of interest because an alternative approach, namely,
the use of the k largest order statistics, with k being fixed, fails to entail
consistent estimators. The article of Hill (1975) attracted more attention than
that of Weiss. Presumably, the reason for this is that Hill's estimator is efficient
Bibliographical Notes
291
and, moreover, is related to the m.l. estimator of the scale parameter in the
exponential model. Notice that the estimation of the tail index based on the
k largest order statistics, with k fixed and n ~ 00, is equivalent to estimating
the scale parameter in the exponential model for the fixed sample size k. Hill's
estimator and related estimators were extensively studied in literature [e.g. de
Haan and Resnick (1980), Hall (1982b), Hall and Welsh (1984), Hausler and
Teugels (1985), and Smith (1987)]. The estimation of the endpoint of dJ.'s in
the Weibull case was treated by Hall (1982a).
Falk (1985b) took up Weiss' approach of approximating models and derived the properties, as essentially known in literature (Hall (1982b), Hausler
and Teugels (1985)), of Hill's estimator by using the properties of the m.l.
estimator in the exponential model (compare with Sections 9.4 and 9.5).
The method of taking maxima of subsamples is due to Gumbel; a typical
example is to take annual maxima. The results of Sections 9.5 and 9.6 are
partly taken from Reiss (1987). A comparison of the two different methods,
namely, to base the inference on the k largest order statistics and, respectively,
to use the subsample method, was also carried out in the paper by Husler and
Tiago de Oliveira (1988) within a parametric framework.
The estimation of the parameters of extreme value dJ.'s is related to the
estimation of the qJ. near to the ends of the support (see Section 9.7). This
subject was dealt with in the articles by Weissman (1978), Boos (1984), Joe
(1987), Smith (1987), and Smith and Weissman (1987), among others. In this
context another interesting paper is that of Heidelberger and Lewis (1984)
who suggested applying the subsample method to reduce the possible correlation of the r.v.'s and to reduce the problem of estimating extreme quantiles to
that of estimating the median; moreover, it may have computational advantages to reduce the sample size in certain simulations by applying the subsample method.
The statistical procedures in Sections 9.2-9.6 are either based on the k
largest order statistics or on k subsamples. The choice of the number k
is crucial for the performance of the statistical procedures. The optimal
choice heavily depends on the given model as is pointed out in this chapter.
Some work has been done concerning the selection of the model; we refer to
Pickands (1975), Hall and Welsh (1985), and to Section 9.5 for some results.
The advice ofDu Mouchel (1983) to take the upper 10 per cent of the sample
might be valuable for practitioners. The visual comparison between sample
and extreme value d.f.'s gives further insight into the problem.
CHAPTER 10
Approximate Sufficiency of
Sparse Order Statistics
293
(10.1.1)
carries the model [JJ to the model fl. Notice that the map, is not one-to-one,
and hence to return from fl to [JJ one has to make use of a Markov kernel.
Markov Kernels
A Markov kernel K carrying mass from the probability space (S1,g;J1,Q) to
(So,g;Jo) has the following two properties:
(a) K( ./y) is a probability measure on g;Jo for every y E S1' and
(b) K(B/) is measurable for every B E g;Jo.
Recall that
KQ(B):=
f K(B/)dQ
(10.1.2)
= 1B(T(y)) = BT(y)(B)
with Bx denoting the Dirac measure with mass 1 at x. In this case, given y the
value T(y) is chosen "with probability one."
which means that P/1 can be reconstructed from Q/1 by means of the Markov
kernel K.
294
P8(C) =
f K(q)dQ8'
() E
e.
(10.1.4)
The same conclusion holds if one starts with a critical function 1/1 defined
on So. The Fubini theorem for Markov kernels implies that
e,
(10.1.5)
JI/I(x)K(dxl)
are of equal
() E
lfr =
I/I(x)K(dxl T)
w.p.1.
295
Recall that the Neyman criterion provides a powerful tool for the verification of sufficiency of T. The sufficiency holds if the density P8 of P8 (w.r.t. some
dominating measure) can be factorized in the form P8 = r(h8 0 T).
EXAMPLES 10.1.1.
(i) Let ~ be a family of uniform distributions with unknown location parameter. Then, (X 1:n, Xn:n) is sufficient.
(ii) Let ~ be a family of exponential distributions with unknown location
parameter. Then, Xl:n is sufficient.
e(O),
0E
8ee
(10.1.6)
where K ranges over all Markov kernels from (Sl,Bi 1 ) to (So, Bio) The deficiency o(!2, f!lJ) of !2 w.r.t. ~ measures the amount of information which is
needed so that !2 is more informative than ~. If TP8 = Q8, 0 E e, then
o(~,!2) =
o.
Notice that
between !2 and
o(!2,~)
~
~(!2,~)
(10.1.7)
296
I/J dPe -
s~p lPe(B) -
(10.1.8)
KQiB) I
I/J dPe -
<5(~, 9)
(10.1.9)
-+
f(x - 0)
(10.1.10)
with f being fixed. Assume that f(x) > 0, a ::;; x ::;; b, and =0, otherwise.
A typical example is given by the uniform density
= Z- 11[-1.11
(10.1.11)
Denote by &>~n the special model under condition (10.1.11). Recall from
Example 10.1.1 that (XI:n,Xn:n) is a sufficient statistic in this case.
Step 1 (Approximate Sufficiency of (X 1:n, Xn:n)). Under weak regularity conditions it can be shown that (X 1:n, Xn:n) is still approximately sufficient for
the location model9o,n. We refer to Weiss (1979b) for a global treatment and
to Janssen and Reiss (1988) for a local "one-sided" treatment ofthis problem.
The technique for proving such a result will be developed in the next section.
Regularity conditions have to guarantee that no further jumps of the density
occur besides those at the points a, b.
Let &>l,n = {PI,n,e} denote the model of distributions of (Xl:n' Xn:n) under
the parameter O.
Approximate sufficiency means that there exists a Markov kernel K I such
that PO,n,e can approximately be rebuilt by KIP1,n,e. In terms of e-deficiency
we have
(10.1.12)
where e(n) -+ 0, n -+ 00. We remark that e(n) = O(n- 1 ) under certain regularity
conditions.
In the special case of (10.1.11), obviously,
A(9~n,91,n)
o.
(10.1.13)
297
Step 2 (Asymptotic Independence of X l :n and Xn:n). Next Xl:n and Xn:n will
(10.1.14)
models is not yet finished. Under mild conditions (see Section 5.2), the extremes Yl :n and Y,,:n have an exponential distribution with remainder term
of order O(n-l). More precisely, if the extremes l/:n, i = 1, n, are generated
under the parameter 0 then
sup IP6{(Yl :n - a) E B} - Ql,n,6(B)1 = O(n- l )
B
(10.1.16)
and
sup IP6{(y":n - b) E B} - Q2,n,6(B)1 = O(n-l)
B
.f x:2: 0
x< 0
_ {nf(a)exP[ -nf(a)x]
0
' x) n
(10.1.17)
and
( ) _ {nf(b)eXP[nf(b)Y]
q2 ,n Y - 0
.f y::; 0
Y> o
Qn,n,6
= O(e(n) + n- l ).
(10.1.18)
One may obtain a fixed asymptotic model by starting with the model of
distributions of n(Yl :n - a) and n(Y,,:n - b) under local parameters nO in place
ofO.
Step 4 (Estimation of the Location Parameter). In a location parameter model
298
that are equivariant under translations; that is, given the model
estimator has the property
~3,n
the
(10.1.19)
~3,n
then
(10.1.20)
en(X l : n - a, Xn:n - b)
e-+ Tn n
n
i=l
1[9-1,9+1j(Xi : n )
has its maximum at any e between Xn:n - 1 and X l : n + 1. Hence, the m.l.
principle does not lead to a reasonable solution of the problem.
For location parameter models it is well known that Pitman estimators are
optimal within the class of equivariant estimators (see e.g. Ibragimov and
Has'minskii (1981), page 22, lines 1-9).
It is a simple exercise to verify that
(10.1.22)
is a Pitman estimator w.r.t. any sub-convex loss function L(' - '). Note
that L(' - .) is sub-convex if L is symmetric about zero and LI [0, (0) is
nondecreasing.
If L is strictly increasing then the Pitman estimator is uniquely determined.
Let us return to the ultimate model ~3,n' A Pitman estimate en (x, y) w.r.t.
the loss function L( . - . ) minimizes
(10.1.23)
rx L(e -
Jy
(10.1.24)
299
+ y)/2
+ Xn:n -
(a
+ b)]/2
(10.1.25)
e, g E G(8)}
.?l = {QO,h: 8 E
e, h E H(8)}
and
where g and h may be regarded as nuisance parameters. The notion and the
results above carryover to the present framework .
.?l is said to be e-deficient w.r.t. &' if
sup IPo,y(B) - KQO,h(B)1
:S;
e(8, g, h),
0E
e, g E G(8), h E H(8),
(10.1.26)
O,g,h
(10.1.27)
where K ranges over all Markov kernels from (Sl,81d to (So, 810 ), Moreover,
the symmetric deficiency of .?l and &' is again defined by
(10.1.28)
300
prove such a result one has to construct an appropriate Markov kernel which
carries the second model back to the original model.
Let X1:n ::::;; ... ::::;; Xn:n be the order statistics of n i.i.d. random variables
with common dJ. F which is assumed to be continuous. Theorem 1.8.1
provides the conditional distribution
(10.2.1)
of the order statistic (X 1:n, ... ,Xn:n) conditioned on (Xr,:n,Xr2:n"",Xrk:n)=
x. Re-::all that Kn is a Markov kernel having the "reproducing" property
KnPn(B) =
f Kn(BI')dPn = P{(X1:n,X2:n"",Xn:n)EB}
(10.2.2)
(+ 1)2
J1/2
(10.2.3)
where
c5(F) = sup 1f'(y)11 inf j2(y).
YE(O.l)
(10.2.4)
YE(O.l)
PROOF. Let Kn denote the Markov kernel in (10.2.1) given the dJ. F. Applying
Theorem 1.8.1 we obtain
301
~ s~p IKn(BI') -
(1)
K:(BI')I dPn
~ f s~p 1
C~ ~.x ) (B) - C~ Qj.x )<B) 1
dPn(x)
where Pri .x = Qri. x are the Dirac-measures at Xi for i = 1, ... , k; moreover
for i = 1, ... , k + 1 and j = r i - 1 + 1, ... , r i - 1 the probability measures ~.x
and Qj.x are defined by the densities:
Pi.x
and
qi,x
1(Xi_"Xj(Xi - Xi-i)
X n +1 =
1].
== gj,x = (Pj,x/%,x) - 1
I(x ~,x)(B) (X
-
)=1
)=1
1, ... , k
+ 1,
Qj,x)(B)1
(2)
where
p(F)
= [
YE(O,l)
The second inequality in (2) can easily be verified by using the representation
gj,x(Y)
f'(v)
f(u) (y - u),
(3)
B} - K: Pn(B)1
where XO,n =
(4)
302
inf P(y) ~ (
r. - r
J
ye(O,I)
11+ )2/
(5)
inf P(y)
ye(O,I)
Notice that c5(F) can be regarded as a distance between F and the uniform
dJ. Fo.
EXAMPLE 10.2.2. If the differences ri - ri - 1 are of order O(m(n)) and k == k(n)
is of order O(n/m(n)) which means that, roughly speaking, the indices ri are
equi-distant, then the right-hand side of (10.2.3) is of order
O(c5(F)m(n)/nl/2 ).
Thus, if m(n) = o(nl/2) (entailing that the number k of order statistics has
to be larger than n 1/2 ) then the right-hand side of (10.2.3) goes to zero as n
goes to infinity even if c5(F) is bounded away from zero.
If n1/ 2 = O(m(n)) then F should also depend on n. In the statistical context
this means that our model has to shrink towards the uniform dJ. as n goes to
infinity.
A typical situation for such a dependence on the sample size n occurs in
the context of a goodness-of-fit test when one is testing the uniform dJ. Fo
against an alternative Fn having a density f,. given by
f,.(x) = 1 + p(n)n- 1/2h(x).
Local Formulation
Theorem 10.2.1 may be extended in various directions. In cases where one is
only interested in local properties of F, our considerations will be based on a
statistic only depending on certain extreme or central order statistics, say,
Xr:n ~ Xr+l:n ~ ... ~ X.:n where 1 ~ r ~ s ~ n. Again the number of order
statistics may be reduced. If r1 = rand rk = s then, in contrary to the conditions of Theorem 10.2.1, it suffices to assume that 0 ~ a(F) < w(F) ~ 1.
For the formulation of Addendum 10.2.3 we introduce the projection
r == r(r, s) defined by
303
Note that in the following context, Markov kernels will rebuild the joint
distribution of X r:n, X r+ l : n, ... , Xs' Define a Markov kernel adjusted to the
present problem, namely,
K:,t('lx)
= rK:(lx).
Note that K:,t( 'Ix) is a marginal distribution of K:( . Ix). Check that
K:,t( 'Ix) is the conditional distribution of (Ur:n, Ur+1:n, ... , Us:n) given
(Uri :n' Ur2 :n,, Urk :n) = x.
where
b(F) =
sup
ye(<z(F),w(F))
1f'(Y)I/
inf
P(y).
ye(<z(F),w(F))
Transformed Models
The results until now are concerned with d.f.'s F close to the uniform dJ. Fo
on (0,1). If we fix some other continuous d.f., say Go in place of Fo then the
probability integral transformation may be applied to reduce the problem
again to the former case.
The dJ.'s G close to Go have to be of the form G = FoGo where F (being
equal to G 0 G( 1 ) has to fulfill the conditions of Theorem 10.2.1. If Yi:n are
the order statistics of r.v.'s with common dJ. G then Xi:n = Gol(Yi:n) are the
order statistics of r.v.'s with common dJ. F. Thus, Theorem 10.2.1 applies to
X i :n
In order to formulate the problem for the original order statistics Yi:n we
introduce the Markov kernel M: where M:( 'Iy) is the conditional distribution of (Y1: n,Y2 : n, ... ,y":n) given (,.I: n ',.2: n ""',.k: n )=Y in the special
case of G = Go (where again the dependence of M: on r10 ... , rk will be
suppressed).
304
Theorem 10.2.4. Let 1 .:s;; k .:s;; nand 0 = ro < r 1 < ... < rk < rk+1 = n + 1. Let
F be a continuous df. with a(F) = 0 and w(F) = 1. Assume that F has two
derivatives on (0, 1). Put f = F'.
Denote by Qn the joint distribution of the order statistics Y,., :n' ... , y"k: n
where the 1';:n are the order statistics of n i.i.d. random variables with common
df. G1 = FoGo Then,
B} - M,iQn(B)1
.:s;; b(F) [
k+1
(rj -
rj-l _
1) (
rj
j=l
rj-1
+ I)Z
n+1
J1 /Z (10.2.6)
where again
b(F) =
ye(O,l)
ye(O,l)
Theorem 10.2.4 was stated in such a way that it can easily be deduced from
Theorem 10.2.1, however, this formulation looks rather artificial. Further
insight in the nature of the term b(F) can be obtained by means of a different
representation of the density f
From P.1.S and Remark 1.5.3 we conclude that G1 has the Go-density
g = foGo. Hence, f = g 0 G01 , according to Criterion 1.2.3. Thus, the conditions of Theorem 10.2.4 can be reformulated in the following way:
Assume that G1 has the Go-density g so that g 0 G01 is differentiable.
Moreover, the term b(F) can be replaced by
sup I(g
ye(O,l)
ye(O,l)
G01 )Z(y).
B} - M,iQn(B) I
sup IP{(X1:n,X2:n"",Xn:n)
B} - K:Pn(B)I.
(10.2.7)
PROOF. From P.1.S we know that G 1(rO is a r.v. with dJ. FoGo if 11 is a r.v.
with dJ. F. This implies that
(Y1:n, YZ : n,"" Y,,:n)
(1)
305
(2)
(3)
then it is apparent that (10.2.7) holds. From (1) we also know that
M:Qn(B) = EM:(BI GOl(X'1 on), Go l (X'2: n ), , Gol(X'k: n ))'
(4)
whenever IX(F) < Xl < x 2 < ... < x k < w(F) with Yi denoting GOl(Xi)'
Since Go is continuous we know that GOl is strictly increasing thus,
IX(Go) =: Yo < Yl < ... < Yk < Yk+l := w(Go)
Put Xo = 0 and Xk+l = 1. Let ex denote the Dirac-measure at X (with mass
1). Moreover, Q denotes the probability measure corresponding to Go, and Q
is the uniform distribution on (0, 1).
It is obvious that ey; is induced by ex; and GOl . Moreover, from P.1.6 we
know that the truncation of Q to the interval (Yi-l, Yi), say, QYi-l'Y' is induced
by QX'_I'X, and GOl for i = 1, ... , k + 1. Thus, Theorem 1.2.5(i) yields that
M:( 'IYl,'" ,Yk) is induced by K:( Ix l , ... ,Xk) and the map
(u l , U2, ... , un) --+ (GOl (U l ), GOl (U2), ... , GOl (Un)).
306
+ a.
Corollary 10.3.1. Let Xi:n be the ith order statistic of n i.i.d. random variables
with df. G given by G(x) = F((x - Jl)/a) for Jl < x < Jl + a with - 00 < Jl < 00
and a > O. It is assumed that F is continuous and has two derivatives on
(0,1) = (oc(F), w(F. Put f = F'.
Let 1 ~ r = r 1 < r2 < ... < rk = s ~ n. Let K:'t denote the Markov kernel
defined in Addendum to.2.3, and let again Pn denote the joint distribution of the
sparse order statistics X r, on' X r2 :n, ... , X rk :n. Then,
sup IP{(Xr:n,Xr+1:m,,,,Xs:n)EB} - K:,tPn(B) I
B
where again
c5(F) = sup
ye(O,l)
1f'(Y)I/ inf
ye(O,l)
j2(y).
PROOF.
and noting
D
307
M:
(10.3.1)
is a Markov kernel.
Let again Pndenote the joint distribution of the sparse order statistics X r, on'
X r2 :n, ... , X rk :n. We shall use M:Pn as an approximation to the joint distribution of X r:n, X r+1:n, ... , X s:n. The accuracy of this approximation will depend
on the performance of the estimator On and the distance of G from the
parametric family {G(,O): e E 0}. We assume that the drs G(,O) have
densities, say, g(., 0).
Theorem 10.3.2 will be proved under a local Lipschitz condition. Given a
fixed parameter 00 E 0 assume that
(0/oy)G(G- 1(Y1'00)'0)
I(0/oy)G(G- 1(yz,00),0) -
(10.3.2)
for every E 0 with 110 - Oollz ::; e, C ~ 0 and Y; with 0 < q1 < Y; < qz < 1
for i = 1,2.
In (10.3.2) it is implicitely assumed that g(x,O) > 0 for every x with
G- 1(q1'00) < x < G- 1(qz,00) and withllO - Oollz::; e.
lf 0 = {Oo }-that is the problem of Section 1O.2-then (10.3.2) holds with
C=O.
Another set of conditions involving the partial derivatives
(OZ /oe;oy)log g
will be examined in Criterion 10.3.3.
Theorem 10.3.2. Let 1 ::; k ::; nand 1 :$ r = r 1 < rz < ... < rk = s ::; n. Let
X;:n be the ith order statistic of n i.i.d. random variables with common df
G = F 0 G(,Oo) where 00 E 0 and F is a df with cx(F) = 0 and w(F) = 1.
Moreover, suppose that F has two derivatives on (0, 1). Put again f = F'.
Suppose that the df's G(, 0) fulfill condition (10.3.2) for some constants e,
C > 0 and 0 < q1 < qz < 1.
Then for every measurable and 0-valued estimator On we have, with
defined as in (10.3.1),
M:
::; [(j(F)
~ C, e)] [kL
+ p(F, On,
j=Z
M: Pn(B)!
(rj - rj - 1
(r.-r.1+3)ZJ1/Z
1)
)n+1
Oollz > e}
308
=
PROOF.
(C/
(0.1)
0011~]1/4).
B} - M: Pn(B) 1
1)2 J1 /2
+
~ c5(F) [ Ik (rj - rj - 1 - 1) ( rj - r~j-_1~_
j=2
.I (rj k
J=2
rj- 1 - 1)
(1)
/
I/!ix) dPn(x) J1 2
9 == 9n ,
hj ,x(y,8)
B} - M: Pn(B)
~ sup IP{(X"n,Xr+1,n"",X"n)
+ sup 1M: 0
B
~ c5(F) [
Pn(B) - M: Pn(B) 1
j=2
B} - M:ooPn(B)
(rj _ rj - 1 _
1) (rj -
rj - 1 +
n+ 1
1)2 J1 /2
where Pr"x and Qr"x are the Dirac-measures at Xi' and for i = 2, ... , k and
j = ri - 1 + 1, ... , ri - 1 ~he probability measures IJ,x and Qj,X are defined
by the densities hj,x(', O(x)) and hj,x(', ( 0 ), Now (1) is immediate from
inequality (3.3.10) and the Schwarz inequality.
For every x E A we obtain, with Zj = G(xj , ( 0 ), that
2
2
2
I/!j(x) ~ C 110(x) - 001121Zj - Zj-11 .
(2)
A
309
1
t/!j(x) = - - Zj - Zj-1
X
G(G
-1
1
Zj - Zj-1
G - 1 (Zj,9 0 )
G-l(Zj_l,90 )
[9(Y,9(X))
g(y, (0)
Zj - Zj-1 -1
(Zj'Oo), O(x)) - G(G (Zj-1'00)'0(x))
A
Zj
Zj-l
J2 g(y, )dy
and hence (2) follows at once from condition (10.3.2) by noting that q1 < Zl <
Z2 < .. , < Zk < q2'
It is immediate from (2) and the Schwarz inequality that
(3)
E(G(Xrj:n' (0)
:::.;; (
(4)
YE(O,l)
:::.;; (
+ 1W.
YE
Criterion 10.3.3. Assume that E> is an open and convex subset of the Euclidean
d-space. Assume that the partial derivatives (8 2 j80j 8y)log 9 exist.
Then condition (10.3.2) holds with
C
where
= exp[elq2 - q1IK(g)]K(g)
with the supremUm ranging over all (y, 0) with q 1 < Y < q2 and II
PROOF.
0 0 112 :::.;; e.
- Y2)!
310
Final Remarks
Let us examine the problem of testing the parametric null-hypothesis
{G( ,0): 0 E e} against certain nonparametric alternatives Gn
It is easy to see that Gn is of the form Fn 0 G(, 0 0 ) where Fn has the density
fn(Y) = 1 + h(G- 1(y, Oo))rx(n) if, and only if, Gn has the density
gn(x) = g(x,O o)(l
+ h(x)rx(n))
where f h(x)g(x, 0 0 ) dx = O. In this case if hand h'(G- 1(., 00))/g(G- 1 (., 0 0 )) are
bounded we have J(Fn) = O(rx(n)) and infyE (o,ldn(Y) ~ 1 - O(rx(n)).
Within the present framework one has to find an appropriate estimator of
O. The problem of constructing estimators which are optimal in the sense
of minimizing the upper bound in Theorem 10.3.2 is also connected to
the problem of finding an "optimal" parameter 00 which makes J(F) =
J(G 0 G- 1 (., 0 0 )) small.
Given a functional T on the family of all qJ.'s so that T(G- 1(., 0)) = 0, the
statistical functional T(Fn- 1) is an appropriate estimator of T(G- 1) and thus
of 0 0 if G- 1 is close to G- 1 ( ,0 0 ). Since the estimator On is only allowed to
depend on the sparse order statistics X r / :n' X r2 : n, ... , X rk : n one has to take
a statistical functional w.r.t. a version ofthe sample qJ. which is based on these
sparse order statistics.
M:
sup
B
- M: Pn(B) I ~ Bo(G, r, n)
(10.4.1)
where X1:n ~ ... ~ Xn:n are the order statistics of n i.i.d. random variables
with common dJ. G, and Pn is the joint distribution of X r /: n, X r2 : n, ... , X rk : n.
The decisive point in (10.4.1) is that the Markov kernel
is independent
ofG.
M:
Let us also apply the result of Section 4.5, namely, that central order
statistics X r / :n' X r2 : n, ... , X rk : n are approximately normally distributed.
311
z,... , Y;)
B} I ~
8 1 (G,
r, n)
(10.4.2)
(Y{, y
Jl (G)
n:
1 (1 -
(~)
(~))
(10.4.3)
given by
~ 1)/[(n + l)g( G-
C:
1)
)g( G-
C~
1)) ]
(10.4.4)
for 1 ~ i ~ j ~ k.
Since (10.4.2) can be extended to [0, l]-valued measurable functions (see
P.3.5) we obtain
sup
B
~ 8 1 (G, r, n).
(10.4.5)
B} -
M: N(I1(G),l;(G))(B) I
(10.4.6)
(10.4.6) connects the following two models. The first one is given by joint
distributions of order statistics X r:n, ... , Xs:n with "parameter" G; the second
one is a family of k-dimensional normal distributions with parameters
(Jl(G), L(G)). In the sense of (10. 1.26), the model, given by normal distributions
N(I1(G),l;(G))' is 8(G, r, n)-deficient w.r.t. the model determined by the order
statistics X r:n, X r+1:n, ... , X s:n.
If (10.4.6) holds for r = 1 and s = n then the following result also holds: Let
~1' ~2' ... , ~n be the original i.i.d. random variables. Since the order statistic
is sufficient we find a Markov kernel M:* (see also P.1.29) such that
(10.4.7)
Next we present the main ideas of an example due to Weiss (1974, 1977)
where the approximating normal distribution depends on the original dJ. F
only through the mean vector. Moreover, we indicate the possibility of calculating a bound of the remainder term of the approximation.
EXAMPLE 10.4.1. As a continuation of Example 10.2.2, the uniform dJ. Fo on
(0,1) will be tested against a composite alternative of dJ.'s Fn having densities
in given by
312
o :$; x :$; 1,
and = 0, otherwise, where S5 h(x) dx = O. The term f3(n) will be specified later.
Part 1 (Asymptotic Sufficiency). Recall from Example 10.2.2 that sparse order
statistics
Ai = rj(n
+ 1).
Let f3i.i and f3i,i-l be given as in the proof of Lemma 4.4.2. Recall that the f3i.j
define a map S such that SN(o.r.) = N(O.I) where L = (O'i) and O'i,j = Ai(l - A),
i :$; j. The decisive point is that these values do not depend on F. Define
(10.4.8)
for i = 1, ... , k where 131,0 = O. Notice that Zl' ... , Zk are known to the
statistician, and hence tests may be based on these r.v.'s. The Zi are closely
related to spacings, however, the use of spacings would not lead to asymptotically independent r.v.'s (compare with P.4.4).
Applying (10.4.2) we obtain that Zi' ... , Zk can be replaced by independent
normal r.v.'s Y1' ... , ~ with unit variances and expectations equal to
i
= 1, ... , k.
(10.4.9)
Thus, we have
sup IP{(Zl, ... ,Zd E B} - P{(Y1""'~) E B}I = 0(1).
B
(10.4.10)
= (J11, .. ,J1d = 0
(10.4.11)
against
i
= 1, ... , k,
313
(fo h2(x) dx
1
)1/2
= (j
(10.4.13)
> O}
(10.4.14)
under the additional requirement that the performance of the test procedure
depends on the underlying parameter JI through IIJlI12 only; thus, the test is
invariant under orthogonal transformations. In Parts 4 and 5 we shall recall
some basic facts from classical, parametric statistics.
Part 4 (A x2-Test). Let us first consider the case where ff1 = {JI: IIJlI12 > O}
without taking into account that h has to satisfy a certain smoothness condition that also restricts the choice of the parameters JI. The uniformly most
powerful, invariant test of level a is given by the critical region
(10.4.15)
where
(10.4.16)
and xL is the (1 - a)-quantile of the central x2-distribution with k degrees of
freedom. According to Weiss (1977) the critical region Ck is also a Bayes test
for testing IIJlI12 = 0 against IIJlI12 = (j with prior probability uniformly dis-
314
tributed over the sphere {JI: IIJlllz = c5} (proof!). Moreover, Ck is minimax for
this testing problem.
Since Yk = (Y1"'" y") is a vector of normal r.v.'s with unit variance and
mean vector JI we know that 1k is distributed according to a noncentral
xZ-distribution with k degrees of freedom and noncentrality parameter IIJlII~.
If k == k(n) tends to infinity as n -+ 00, the central limit theorem implies that
(2k
Ct
+ 411J1W- 1/z
(Y;Z - 1) -
IIJlII~ )
(10.4.17)
Ck =
tt
zl > xL},
(10.4.19)
(f
h2(X) dX) )
+ o(kO)
(10.4.20)
(10.4.21)
where Vj = (viI), ... , vik)), j = 1, ... , s, are orthonormal vectors W.r.t. the inner
product (x, y) = I;;l XiYi' The well-known solution of the problem is to take
the critical region
(10.4.22)
where
(10.4.23)
Notice that T. = I Yk liz where Yk = I;;l (Vj' yk)Zvj is the orthogonal projection of Yk onto the s-dimensional linear sub-space. The statistic T. is again
315
00 then,
obviously, our asymptotic considerations belong to parametric statistics. If
s == s(n} --+ 00 as n --+ 00 then, e.g. in view of the Fourier expansion of square
integrable functions, the sequence of original models approaches the space of
square integrable densities close to the uniform density showing that the
testing problem is of a nonparametric nature.
The foregoing remarks seem to be of some importance for non parametric
density testing (and estimation). Note that the functions h may belong to the
linear space spanned by the trigonometric functions el' ... , e. (see P.8.5(i.
So there is some relationship to the orthogonal series method adopted in
nonparametric density estimation. The crucial problem in nonparametric
density estimation is to find a certain balance between the variance and the
bias of estimation procedures. Our present point of view differs from that
taken up in literature. First, we deduce the asymptotically optimum procedure
w.r.t. the s(n}-dimensional model. These considerations belong to classical
statistics. In a second step, we may examine the performance of the test
procedure if the s(n}-dimensional model is incorrect.
that
Y..:.) E B}I.
ye(O.l)
[Hint: Use the fact that J(y)!J(x) = exp[(f'(z)IJ(z))(y - x)] with z between x
and y.]
3. Theorem 10.2.1 holds with the upper bound replaced by
(c/
inf
ye(O.l)
n+1
316
'1
4. (i) If 0 = '0 < < '2 < ... < 'k = S, , = 1 and IX(F) = 0 then (10.2.5) holds with
IJ=2 replaced by IJ=I'
(ii) If, = 'I < '2 < ... < 'k < rk+1 = n + 1, S = nand w(F) = 1 then (10.2.5) holds
with IJ=2 replaced by IJ:~
5. Let r(xl, ... ,x.) = (x.- k +1""'x.). Under the conditions of Addendum 10.2.3, if
IX(F) ~ 0 and w(F) = 1,
sup
B
~[
sup
ye(<F),I)
1f'(Y)I/ inf
ye(<F), I)
j2(y)Jk 3/2
/n.
6. (i) Verify condition (10.3.2) with C = p(g)K(g) where K(g) is given as in Criterion
10.3.3 and
p(g) =
sup
1I0-001i2'"
sup
G(G-I(y,Oo),O)/ inf
ql <y<q,
G(G-I(y,Oo),O)).
ql <y<Q2
I(%y)G(G
6,
and
ql
001121y, - Y21)2
+ 1)1 /2(Pi,i(X,,:, -
G-I(A i ))
+ Pi,i-1 (X'H:' -
G-I(A;))).
where gi = g(G-I(A i)). N(o,f.) again denotes the k-variate normal distribution with
mean vector zero and covariances Ui,j = Ai(l - Aj), 1 ~ i ~ j ~ k. Prove that
sup
B
lIP. -
- ~'O,,)(B)I
k
N(o,f.)
I + 2- 1/2 [ i~ (U;,i -
+ 210g gil
]1/2
-2
= gi
Ai-I (1 - Ai) ( gi
2
gi (Ai - Ai-d gi-I
)2
1 ,
i = 2, ... ,k.
[Hint: Let H be the diagonal matrix with diagonal elements tfi,i = l/g i Let l: =
B 0 H 0 l: 0 Ht 0 Bt where B is defined as in the proof of Lemma 4.4.2. Notice that
det(l:') = (det(H))2.J
8. Specialize Example 10.4.1, Part 5, to trigonometric functions (see P.8.5).
9. Extend Example 10.4.1 to the composite null-hypothesis of uniform distributions.
Bibliographical Notes
317
Bibliographical Notes
The reader who is interested in the theoretical background concerning the
comparison of experiments is referred to Torgersen (1976), Strasser (1985), and
Le Cam (1986). The article of Torgersen gives a short, illuminating introduction to this subject.
The magnificent idea to study a construction like that in Theorem 10.2.1
is due to L. Weiss (1974) who also gave some asymptotic results. The extension
of the problem from a single dJ. to a parametric family of drs was suggested
by Weiss (1980). Weiss carried out a detailed study in the location and scale
parameter case. Further insight into the problem of comparing models based
on order statistics was obtained by Reiss et al. (1984) where a sharp bound of
the remainder term of the approximation was also established. The present
approach is taken from Reiss (1986). Some results concerning the sufficiency
of extremes within a parametric framework can be found in the articles by
Weiss (1979b) and Janssen and Reiss (1988). In the second article the location
model of a Weibull sample is locally compared with location models defined
by
(S;(I%
+ (J)m~k
+ (J)m=1.2.3 ....
APPENDIX 1
Extending the definition of a q.f. (see (1.1.10)) we define the inverse 1jJ* of a
real-valued, nondecreasing and right continuous function IjJ with domain
(C!, w) by setting
1jJ*(y) = inf{t E (C!,w): ljJ(t);;::: y}
for
-00
<y<
00
(A. 1.1)
(A. 1.2)
that is, 1jJ-1 is the restriction of 1jJ* to the interval (inf ljJ(s), sup ljJ(s)).
Thus, in the particular case of the q.f. we have IjJ = F, (C!, w) == real line,
(inf ljJ(s), sup ljJ(s)) = (0, 1), and 1 = F- 1. From the definitions ofljJ* and 1jJ-1
one can easily conclude that 1jJ* is [C!,w]-valued and 1jJ-1 is (C!,w)-valued.
Lemma A.t.t. For IjJ as above, if C! < x < fl then for every real y,
y S; ljJ(x)
iff
1jJ*(y) S; x.
(A. 1.3)
Since 1jJ*(y) is the inf of all t E (C!, w) such that ljJ(t) ;;::: y it is clear that
ljJ(x) ;;::: y implies x ;;::: 1jJ*(y). Conversely, for every z > x ;;::: 1jJ*(y) we have
ljJ(z) ;;::: y, and thus, y S; limzLx ljJ(z) = ljJ(x) since IjJ is right continuous.
D
PROOF.
It is clear that (A. 1.3) also holds for 1jJ-1 and inf ljJ(s) S; y S; sup ljJ(s) in place
of 1jJ* and - 00 < y < 00. Thus (1.2.9) is a special case of (A. 1.3).
We already know that 1jJ-1 is a (C!, w)-valued function with domain (inf ljJ(s),
sup ljJ(s)). More precisely, one can easily check that 1jJ-1 is an (C!(IjJ), w(IjJ))valued function where
C!(IjJ) = inf {t
(A. 1.4)
319
and
w(ljJ) = sup{t
(A. 1.5)
It is clear that a ::;; a(ljJ) ::;; w(ljJ) ::;; w. Notice that in the particular case of
adJ. F we have
(A. 1.6)
(A 1.7)
and
For the proof of Theorem 1.2.8 we also need the following auxiliary result.
Lemma A.1.2. If IjJ is as above then 1jJ* is nondecreasing and left continuous.
Moreover,
lim 1jJ*(y) = a,
y- -
00
lim 1jJ*(y) = w,
y-->oo
and
lim
y->infljt(s)
lim
y-->sup Ijt(s)
(y) = w(IjJ).
(A. 1.8)
for -00 < y < 00 (with the convention that sup 0 = a). An application of
Lemma ALl to the nondecreasing, right continuous function defined by
ljJ(x) = - q>( - x)leads to
320
lim <p**(y) =
y~-oo
0(,
and
lim <p**(y) =
y-+oo
13.
APPENDIX 2
The results below will provide us with the basic tools for proving asymptotic
expansions for distributions of extreme and central order statistics.
Expansion of (1
+ x/nt
i~ C)<x/n)i
i~ (-x)i/i!.
and
We have
e- x (1
where
{3(i,n)
+ x/n)" =
co
L {3(i, n)xi
(~lY(.
i=l
J!
(A.2.l)
i=O
n .)ni-1.
I -
(A.2.2)
322
(2X)k+1]
exp [ -(k + l)ak
::<==;
e- x (1
k (_1)i+1 Xi]
iai 1
::<==;
(A.2.3)
for x ~ - a/2. Moreover, the upper bound still holds for x ~ - a. The inequalities are strict for x "# O. Since exp(x) ~ 1 + x we obtain from (A.2.3), applied
to k = 1, that
(A.2.4)
For k = 3, 5, 7, ... the term expO=~=2 (_1)i+1 xi/(iai-l)] is a higher order
approximation to e- x (1 + x/aY but this approximation is not an expansion
of the type as discussed in Section 3.2. However, a Taylor expansion of exp
about zero yields the following result.
Lemma A.2.t. For every positive integer m there exists a constant Cm > 0 such
that for every a ~ 1 and x with - a/2 ::<==; x ::<==; a 2/3 the following inequality holds:
Ie-
x (1
+ x/a)a -
[1 + 2:~1)
[3(i, a)x i ]
::<==;
Cma-m(lxI2m-1
+ Ixl2m)
(A.2.5)
where [3(i, a) are real numbers which have the property max {I [3(2k - 1, a)l,
I[3(2k, a)!} ::<==; Cma- k for k = 1, ... , m - 1.
Moreover, we have
[3(2, a) = -1/(2a),
PROOF.
We have
2m-1 (_I)i+1 Xi] _ m-1 ~ [2(m-1) (_I)i x i+1]i I <
-m 2m
.,l.a i 1
.2:.,.2:
('
)'
i
Ca
x
)=0 J.
,=1 / + 1 .a
(A.2.6)
Iexp [ ,=2
.2:
2(m-1)
Xi+1]i
.2: J.~ ,=1
.2: (/. + 1).a
,i - .2:
I)=0
,=2
[3(i, a)xi
::<==;
Ca- m(lx 2m - 1
+ Ix2ml)
where the values [3(i, a) have the desired property. This together with (A.2.3)
and (A.2.6) implies (A.2.5).
D
By writing down the proof of Lemma A.2.1 in detail one realizes that the
upper bound in A.2.5 still holds for values x with -\I. ::<==; x ::<==; a 2 / 3
For every positive integer n the terms [3(i, a) in (A.2.5) are identical to the
corresponding values, say, [3*(i, a) in (A.2.2). This becomes obvious by noting
that there exists A > 0 and B > 0 such that
323
for every Ixl ~ A. Now a comparison of this inequality to (A.2.5) leads to the
desired identification.
[3 )1/2
ga.P(x) = e- x2 / 2 [ 1 + ( (a + [3)a
x
(A.2.7)
where gi.a.p is a polynomial of degree ~ 3i and the coefficients of gi,a,p are smaller
than Cm((a + [3)/a[3)i/2 for i = 1, ... , m - 1.
In the proof of Lemma A.2.2 one has to choose the polynomials gi,a,p in
such a way that
1 (m-1
'+2)j m-1
I ((a + [3)1/2 Ixl 3)m
~ ~ .~ ai,a,px'
- .f: gi,a,P(X) ~ Cm -[3Im-1
)-1 J. ,-1
,-1
a
(A.2.8)
+ ai,a,px 6 /2,
(A.2.9)
324
}=o
J.
)iJ + (a +
1=1
p)m/2 (lxl m+ 2
_ Cm - ap
+ IxI3m).
Now the proof can easily be completed by choosing the polynomials gi,a,p as
indicated in (A.2.8).
0
If a = p then Lemma A.2.1 and Lemma A.2.2 roughly coincide for nonnegative x.
We believe that an expansion of the function ga,p is of some interest in its
own right, however, this function is not properly adjusted to the particular
problem of computing an expansion of the distribution of a central order
statistic. For this purpose one has to deal with functions ha,p defined by
ha,p(X) = e- x212 [ 1 + ( (a
P )1/2 Ja-1 [
+ p)a x
1-
(A.2.10)
or with some other variation of the function ga,p according to the standardization of the distribution of the order statistic. By using Taylor expansions of
and
about 1, one can easily deduce from Lemma A.2.2 the following:
h1,a,p(X) = gl,a,p(X) -
h2,a,p(X) = g2,a,p(X) -
)1 /2J
(a
+ P)P
(a
+a p)p )1 2J xg 1,a,p(X)
APPENDIX 3
The aim of the following lines is to extend some of the results of Section 3.3
to finite signed measures. Moreover, we prove some highly technical inequalities which do not belong to the necessary prerequisites for the understanding
of the main ideas of this volume. However, these inequalities are useful for
certain computations.
In the sequel, let Vi be a finite signed measure (on a measurable space (S,.?4))
represented by the density J; w.r.t. a dominating measure 11.
vt
(A.3.1)
II va
- viii
- Vi (B)I
(A.3.2)
326
Ilvo - vlll =
l fifo - flld/l.
We note that under the condition that vo(S) = vl(S) we have again
r
l
L Ig - fld/l
+ Lc
Igo - fold/l
Moreover, since g ~
1-
::;; ( f fd/l
L Ig - fld/l
+f
+ Lc Igo -
fo d/ll
fold/l.
327
(Vl x v2)(B) =
fl(xdf2(x 2)d(J.ll
x J.l2)(X l ,X2)
Lemma A.3.3. Let Vi and )'i be finite signed measures for i = 1, ... , k. Then,
PROOF. For notational simplicity we will prove the assertion for k = 2 only.
The general case can easily be proved by induction over k.
For measurable sets A in the product space we get by Fubini's theorem
l(v1 x v2)(A) - (V1 x A2)(A)1 ::; IV11 sup Iv2(A x') - A2(Ax,)l.
(2)
Thus, combining (1) and (2) we get the desired inequality in the case of k = 2.
The proof is complete.
D
Notice that Lemma 3.3.7 is an immediate consequence of Lemma A.3.3.
Moreover, Lemma A.3.3 is an extension of the following well-known formula
It ai - tl bil::; it
OJ
which holds for all real (as well as complex) numbers ai and bi
Corollary A.3.4. For probability measures Qi and finite signed measures Ai with
Ai(Si) = 1 we have
I ~ Qi PROOF.
i~ Ai I ::; exp [ 2 it II Qi -
Ai I ] it I Qi - Ai II
(A.3.3)
Check that
Ijis1 Ajl
328
The proof of Lemma A.3.3 gives a little bit more than stated there. For
every measurable set A we obtain
s.
H(QO,Ql)
~ [2 Qo(B
C)
(-logUdfo))dQo T/2
(A.3.4)
PROOF. According to (3.3.5) we have to establish a lower bound of JUdo) 1/2 d{t.
W.l.g. let Qo(B) > O. Since exp(x) ~ 1 + X, we obtain from the Jensen inequality that
~ Qo(B)
Udfo)1/2 d(Qo/Qo(B))
~ Qo(B)exp [ (2Qo(BWl
~ Qo(B) + r
10gUdfo)dQo ]
10gUdfo)dQo
329
At the end of this section we will discuss in detail the special case of m = 1.
Lemma A.3.6. Assume that Qi and Vi satisfy the conditions above. Let 1 + gi be
a Q;-density of Vi. Then, for every m E {O, ... , k},
T1(ex p
hmd
i~ Qil
where
hm (x 1 ,,xk ) = 1 +
PROOF.
TI giJXiJ
TI air'
i=11~il<"'<ij~kr=1
Notice that
k
TI (1
i=l
+ a;) =
1+
X7=1
X7=1
TI:=l
330
and
L
00
i=m
zi/i!
o.
D(Qi' pY =: Rk
(A.3.6)
exp(z)zm/m!
for z ~
~ 8-
exp [2- 1
it
D(Qi' VY ]
it
and hence
This shows that for k -+ 00 further insight into the variational distance of
product measures may be gained by means of the central limit theorem.
Bibliography
332
Bibliography
Bibliography
333
334
Bibliography
Falk, M. (1986b). On the estimation of the quantile density function. Statist. Probab.
Letters 4, 69- 73.
Falk, M. (1989a). Best attainable rate of joint convergence of extremes. In: Extreme
Value Theory, Eds. J. Hiisler and R.-D. Reiss, pp. 1-9. Lecture Notes in Statistics
51. New York: Springer.
Falk, M. (1989b). A note on uniform asymptotic normality of intermediate order
statistics. Ann. Inst. Statist. Math., Ser. A.
Falk, M. and Kohne, W. (1986). On the rate at which the sample extremes become
independent. Ann. Probab. 14, 1339-1346.
Falk, M. and Reiss, R.-D. (1988). Independence of order statistics. Ann. Probab. 16,
854-862.
Falk, M. and Reiss, R.-D. (1989). Weak convergence of smoothed and nonsmoothed
bootstrap quantile estimates. Ann. Probab. 17.
Feldman, D. and Tucker, H.G. (1966). Estimation of non-unique quantiles. Ann. Math.
Statist. 37,451-457.
Feller, W. (1972). An Introduction to Probability Theory and its Applications. Vol. 2,
2nd ed. New York: Wiley.
Ferguson, T.S. (1967). Mathematical Statistics. New York: Academic Press.
Finkelstein, B.V. (1953). Limiting distribution of extreme terms of a variational series
of a two-dimensional random variable. Dokl. Ak. Nauk. S.S.S.R. 91, 209-211 (in
Russian).
Fisher, R.A. (1922). On the mathematical foundation of theoretical statistics. Phil.
Trans. Roy. Soc. A 222, 309-368. Reprint in: Collected Papers of R.A. Fisher, Vol.
I, Ed. J.H. Bennett, pp. 276-335. University of Adelaide.
Fisher, R.A. and Tippett, L.H.C. (1928). Limiting forms of the frequence distribution
of the largest or smallest member of a sample. Proc. Camb. Phil. Soc. 24, 180-190.
Floret, K. (1981). Mass- und Integrationstheorie. Stuttgart: Teubner.
Fn:chet, M. (1927). Sur la loi de probabilite de l'ecart maximum. Ann. de la Soc.
Polonaise de Math. 6,93-116.
Galambos, J. (1975). Order statistics of samples from multivariate distributions. J.
Amer. Statist. Assoc. 70, 674-680.
Galambos, J. (1984). Order statistics. In: Handbook of Statistics. Vo!' 4, Eds. P.R.
Krishnaiah and P.K. Sen, pp. 359-382. Amsterdam: North-Holland.
Galambos, J. (1987). The Asymptotic Theory of Extreme Order Statistics. 2nd ed.
Malabar, Florida: Krieger.
Geffroy, J. (1958/59). Contributions ala theorie des valeurs extremes. Pub!. Inst. Statist.
Univ. Paris 7/8, 37-185.
Gini, C. and Galvani, L. (1929). Di talune estensioni dei concetti di media ai caratteri
qualitativi. Metron 8. Partial English translation in: J. Amer. Statist. Assoc. 25,
448-450.
Gnedenko, B. (1943). Sur la distribution limit du terme maximum d'une serie aleatoire.
Ann. Math. 44, 423-453.
Goldie, C.M. and Smith, R.L. (1987). Slow variation with remainder: Theory and
applications. Quart. J. Math. Oxford 38,45-71.
Gomes, M.1. (1978). Some probabilistic and statistical problems in extreme value
theory. Ph.D. Thesis, University of Sheffield.
Gomes, M.I. (1981). An i-dimensional limiting distribution function of largest values
and its relevance to the statistical theory of extremes. In: Statistical Distribution in
Scientific Work, Eds. C. Taillie et aI., Vol. 6., pp. 389-410. Dordrecht: Reidel.
Gomes, M.I. (1984). Penultimate limiting forms in extreme value theory. Ann. Inst.
Statist. Math., Ser. A, 36, 71-85.
Gross, A.J. (1975). Survival Distributions: Reliability Applications in the Biomedical
Sciences. New York: Wiley.
Bibliography
335
336
Bibliography
Hill, B.M. (1975). A simple approach to inference about the tail of a distribution. Ann.
Statist. 3, 1163-1174.
Hillion, A. (1983). On the use of some variation distance inequalities to estimate the
difference between sample and perturbed sample. In: Specifying Statistical Models,
Eds. J.P. Florens et aI., pp. 163-175. Lecture Notes in Statistics 16. New York:
Springer.
Hodges, J.L. Jr. and Lehmann, E.L. (1967). On medians and quasi medians. J. Amer.
Statist. Assoc. 62, 926-931.
Hodges, J.L. Jr. and Lehmann, E.L. (1970). Deficiency. Ann. Math. Statist. 41,783-801.
Hoeffding, W. and Wolfowitz, 1. (1958). Distinguishability of sets of distributions. Ann.
Math. Statist. 29, 700-718.
Hosking, J.R.M. (1985). Maximum-likelihood estimation of the parameter of the
generalized extreme-value distribution. Applied Statistics 34, 301-310.
Huang, J.S. and Gosh, M. (1982). A note on the strong unimodality of order statistics.
J. Amer. Statist. Soc. 77, 929-930.
Husler, J. and Reiss, R.-D. (1989). Maxima of normal random vectors: Between
independence and complete dependence. Statist. Probab. Letters 7.
Husler, J. and Schupbach, M. (1988). On simple block estimators for the parameters
of the extreme-value distribution. Commun. Statist.-Simula. 15,61-76.
Husler, J. and Tiago de Oliveira, J. (1986). The usage of the largest observations
for parameter and quantile estimation for the Gumbel distribution; an efficiency
analysis. Pub!. Inst. Stat. Univ. 33,41-56.
Ibragimov, J.A. (1956). On the composition of unimodal distributions. Theory Probab.
Appl. 1,225-260.
Ibragimov, J.A. and Has'minskii, R.Z. (1981). Statistical Estimation. Springer-Verlag,
Berlin.
Iglehardt, D.L. (1976). Simulating stable stochastic systems; VI. Quantile estimation.
J. Assoc. Comput. Mach. 23, 347-360.
Ikeda, S. (1963). Asymptotic equivalence of probability distributions with applications
to some problems of asymptotic independence. Ann. Inst. Statist. Math. 15,87-116.
Ikeda, S. (1975). Some criteria for uniform asymptotic equivalence of real probability
distributions. Ann. Inst. Statist. Math. 27,421-428.
Ikeda, S. and Matsunawa, T. (1970). On asymptotic independence of order statistics.
Ann. Inst. Statist. Math. 22, 435-449.
Ikeda, S. and Matsunawa, T. (1972). On the uniform asymptotic joint normality of
sample quantiles. Ann. Inst. Statist. Math. 24, 33-52.
Ikeda, S. and Nonaka, Y. (1983). Uniform asymptotic joint normality of a set of
increasing number of sample quantiles. Ann. Inst. Statist. Math. 35, Ser. A, 329-341.
Isogai, T. (1985). Some extensions of Haldane's multivariate median and its applications. Ann. Inst. Statist. Math. 37, Ser. A, 289-301.
Ivchenko, G.I. (1971). On limit distributions for the order statistics of the multinomial
distribution. Theory Probab. Appl. 16, 102-115.
Ivchenko, G.I. (1974). On limit distributions for middle order statistics for double
sequence. Theory Probab. App!. 19,267-277.
Jacod, J. and Shiryaev, A.N. (1987). Limit Theorems for Stochastic Processes. Berlin:
Springer.
Janssen, A. (1988). Uniform convergence of sums of order statistics to stable laws.
Probab. Th. ReI. Fields 78, 261-272.
Janssen, A. and Reiss, R.-D. (1988). Comparison of location models of Wei bull type
samples and extreme value processes. Probab. Th. ReI. Fields 78, 273-292.
Joag-Dev, K. (1983). Independence via uncorrelatedness under certain dependence
structures. Ann. Probab. 11, 1037-1041.
Joe, H. (1987). Estimation of quantiles of the maximum of N observations. Biometrika
74,347-354.
Bibliography
337
338
Bibliography
Mann, N.R. (1984). Statistical estimation of the Weibull and Frechet distributions.
In: Statistical Extremes and Applications, Ed. J. Tiago de Oliveira, pp. 81-89.
Dordrecht: Reidel.
Marshall, A.W. and Olkin, I. (1983). Domains of attraction of multivariate extreme
value distributions. Ann. Probab. 11, 168-177.
Matsunawa, T. (1975). On the error evaluation of the joint normal approximation for
sample quantiles. Ann. Inst. Statist. Math. 27,189-199.
Matsunawa, T. and Ikeda, S. (1976). Uniform asymptotic distribution of extremes. In:
Essays in Probab. Statist., Eds. S. Ikeda et al., pp. 419-432. Tokyo: Shinko Tsusho.
Michel, R. (1975). An asymptotic expansion for the distribution of asymptotic maximum likelihood estimators of vector parameters. J. Multivariate Anal. 5,67-85.
Miebach, B. (1977). Asymptotische Theorie fUr Familien von MaBen mit Lokalisations- und Dispersionsparameter. Diploma Thesis, University of Cologne.
Mises von, R. (1923). Ober die Variationsbreite einer Beobachtungsreihe. Sitzungsberichte Berliner Math. Ges. 22, 3-8.
Mises von, R. (1936). La distribution de la plus grande de n valeurs. Rev. Math. Union
Interbalcanique 1, 141-160. Reproduced in Selected Papers of Richard von Mises,
Amer. Math. Soc. 2 (1964), 271-294.
Miyamoto, Y. (1976). Optimum spacings for goodness of fit tests based on sample
quantiles. In: Essays in Probab. Statist., Eds. S. Ikeda et aI., pp. 475-483. Tokyo:
Shinko Tsusho.
Montfort, M.A.J. van (1982). Modellen voor maximum en minima, schattingen en
betrouwbaarheidsintervallen, kreuze tussen modellen, Agricultural University
Wageningen, Netherlands, Dept. Math., Statist. Division, Technical Note 82-02.
Montfort, M.A.J. van and Gomes, I.M. (1985). Statistical choice of extremal models
for complete and censored data. 1. Hydrology 77, 77-87.
Mood, A. (1941). On the joint distribution of the median in sample from a multivariate
population. Ann. Math. Statist. 12,268-278.
Moore, D.S. and Yackel, J.W. (1977). Large sample properties of nearest neighbour
density function estimates. In: Statistical Decision Theory and Related Topics, Eds.
S.S. Gupta and D.S. Moore, pp. 269-279. New York: Academic Press.
Mosteller, F. (1946). On some useful inefficient statistics. Ann. Math. Statist. 17,
377-408.
Nadaraya, E.A. (1964). Some new estimates for distribution functions. Theory Probab.
Appl. 10, 186-190.
Nagaraja, H.N. (1982). On the non-Markovian structure of discrete order statistics. J.
Statist. PI ann. Inference 7, 29-33.
Nagaraja, H.N. (1986). Structure of discrete order statistics. J. Statist. Plann. Inference
13, 165-177.
Nelson, W. (1982). Applied Life Data Analysis. New York: Wiley.
Nowak, W. and Reiss, R.-D. (1983). Asymptotic expansions of distributions of central
order statistics under discrete distributions. Technical Report 101, University of
Siegen.
Oja, H. and Niinimaa, A. (1985). Asymptotic properties of the generalized median in
the case of multivariate normality. J. Roy. Statist. Soc., Ser. B, 47,372-377.
O'Reilley, F.J. and Quesenberry, c.P. (1973). The conditional probability integral
transformation and applications to obtain composite chi-square goodness-of-fit
tests. Ann. Statist. 1, 74-83.
Pantcheva, E.I. (1985). Limit theorems for extreme order statistics under nonlinear
normalization. In: Stability Problems for Stochastic Models, Eds. v.v. Kalashnikov
and V.M. Zolotarev, pp. 284-309. Lecture Notes in Mathematics 1155, Berlin:
Springer.
Parzen, E. (1962) On estimation of a probability density function and mode. Ann.
Math. Statist. 33, 1065-1076.
Bibliography
339
Parzen, E. (1979). Nonparametric statistical data modeling. 1. Amer. Statist. Assoc. 74,
105-121.
Pearson, K. (1902). Note on Francis Galton's problem. Biometrika 1,390-399.
Pearson, K. (1920). On the probable errors of frequency constants. Biometrika 13,
113-132.
Pfanzagl, J. (1973a). Asymptotically optimum estimation and test procedures. In: Proc.
Prague Symp. Asymptotic Statistics, Vol. 1, Ed. J. Hajek, pp. 201-272. Prague:
Charles University.
Pfanzagl, J. (1973b). The accuracy ofthe normal approximation for estimates of vector
parameters. Z. Wahrsch. verw. Gebiete 25, 171-198.
Pfanzagl, J. (1973c). Asymptotic expansions related to minimum contrast estimators.
Ann. Statist. 1,993-1026.
Pfanzagl, J. (1975). Investigating the quantile of an unknown distribution. In: Statistical
Methods in Biometry, Ed. W.J. Ziegler, pp. 111-126. Basel: Birkhauser.
Pfanzagl, J. (1982). Contributions to a General Asymptotic Statistical Theory. (With
the assistence ofW. Wefelmeyer). Lecture Notes in Statistics 13. New York: Springer.
Pfanzagl, J. (1985). Asymptotic Expansions for General Statistical Models. (With the
assistance ofW. Wefelmeyer). Lecture Notes in Statistics 31. New York: Springer.
Pickands, J. (1967). Sample sequences of maxima. Ann. Math. Statist. 38, 15701574.
Pickands, J. (1968). Moment convergence of sample extremes. Ann. Math. Statist. 39,
881-889.
Pickands, J. (1975). Statistical inference using extreme order statistics. Ann. Statist. 3,
119-131.
Pickands,1. (1981). Multivariate extreme value distributions. Proc. 43th Session ofthe
lSI (Buenos Aires), 859-878.
Pickands, J. (1986). The continuous and differentiable domains of attractions of the
extreme value distributions. Ann. Probab. 14,996-1004.
Pitman, E.J.G. (1979). Some Basic Theory for Statistical Inference. London: Chapman
and Hall.
Plackett, R.L. (1976). In: Discussion of Professor Barnett's Paper. J.R. Statist. Soc., Ser.
A, 139,344-346.
Polfeldt, T. (1970). Asymptotic results in non-regular estimation. Skand. Aktuar.,
Suppl. 1-2,2-78.
Prakasa Rao, B.L.S. (1983). Nonparametric Functional Estimation. Orlando: Academic Press.
Puri, M.L. and Ralescu, S.S. (1986). Limit theorems for random central order statistics. In: Adaptive Statistical Procedures and Related Topics, Ed. J. van Ryzin,
pp. 447-475. IMS Lecture Notes 8.
Pyke, R. (1965). Spacings. J. Roy. Statist. Soc., Ser. B. 27, 395-436. Discussion: 437-449.
Pyke, R. (1972). Spacings revisited. In: Proc. 6th Berkeley Symp., Math. Statist.
Probability, Vol. 1, Eds. L.M. Le Cam et aI., pp. 417-427. Berkeley: Univ. California
Press.
Radtke, M. (1988). Konvergenzraten und Entwicklungen unter von Mises Bedingungen der Extremwerttheorie. Ph.D. Thesis, University of Siegen.
Ramachandran, G. (1984). Approximate values for the moments of extreme order
statistics in large samples. In: Statistical Extremes and Applications, Ed. 1. Tiago de
Oliveira, pp. 563-578. Dordrecht: Reidel.
Rao, J.S. and Kuo, M. (1984). Asymptotic results on the Greenwood statistic and some
of its generalizations. J. Roy. Statist. Soc., Ser. B, 46,228-237.
Raoult, J.P., Criticou, D. and Terzakis, D. (1983). The probability integral transformation for not necessarily absolutely continuous distribution functions, and its application to goodness-of-fit tests. In: Specifying Statistical Models, Ed. J.P. Florens et aI.,
pp. 36-49. New York: Springer.
340
Bibliography
Bibliography
341
Resnick, S.I. (1987). Extreme Values, Regular Variation, and Point Processes. Applied
Probability. Vol. 4. New York: Springer.
Rice, 1. and Rosenblatt, M. (1976). Estimation ofthe log survivor function and hazard
function. Sankhya, Ser. A, 38, 60-78.
Rootzen, H. (1984). Attainable rates of convergence of maxima. Statist. Probab. Letters
2,219-221.
Rootzen, H. (1985). Asymptotic distributions of order statistics from stationary normal
sequences. In: Contribution to Probability and Statistics in Honour of Gunnar
Blom, Eds. J. Lanke and G. Lindgren, pp. 291-302. University of Lund.
Rosenblatt, M. (1952). Remarks on a multivariate transformation. Ann. Statist. 23,
470-472.
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27, 832-837.
Rosengard, A. (1962). Etude des lois-limitesjointes et marginales de la moyenne et des
valeurs extremes d'un echantillon. Publ. Inst. Statist. Univ. Paris 11, 3-53.
Rossberg, H.J. (1965). Die asymptotische Unabhiingigkeit der kleinsten und groBten
Werte einer Stichprobe vom Stichprobenmittel. Math. Nachr. 28, 305-318.
Rossberg, H.J. (1967). Ober das asymptotische Verhalten der Rand- und Zentralglieder
einer Variationsreihe (II). Publ. Math. Debrecen 14,83-90.
Rossberg, H.J. (1972). Characterization ofthe exponential and the Pareto distribution
by means of some properties of the distributions which the differences and quotients
of order statistics are subject to. Math. Operationsforsch. Statist. 3,207-316.
Riischendorf, L. (1985a). Two remarks on order statistics. J. Statist. Plann. Inference
11,71-74.
Riischendorf, L. (1985b). The Wasserstein distance and approximation theorems. Z.
Wahrsch. verw. Geb. 66,117-129.
Ryzin, J. van (1973). A histogram method of density estimation. Commun. Statist. 2,
493-506.
Sen, P.K. (1968). Asymptotic normality of sample quantiles for m-dependent processes.
Ann. Math. Statist. 39, 1724-1730.
Sen, P.K. (1972). On the Bahadur representation of sample quantiles for sequences of
cp-mixing random variables. J. Multivariate Anal. 2, 77-95.
Sendler, W. (1975). A note on the proof ofthe zero-one law of Blum and Pathak. Ann.
Probab. 3, 1055-1058.
Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistical. New York:
Wiley.
Shaked, M. and Tong, Y.L. (1984). Stochastic ordering of spacings from dependent
random variables. In: Inequalities in Statistics and Probability. IMS Lecture Notes
5,141-149.
Shorack, G.R. and Wellner, J.A. (1986). Empirical Processes with Applications to
Statistics. New York: Wiley.
Sibuya, M. (1960). Bivariate extreme statistics. Ann. Inst. Stat. Math. 19, 195-210.
Siddiqui, M.M. (1960). Distribution of quantiles in samples from a bivariate population. J. Res. Nat. Bureau Standards 64, Ser. B, 124-150.
Singh, K. (1979). Representation of quantile processes with non-uniform bounds.
Sankhya, Ser. A, 41, 271-277.
Singh, K. (1981). On the asymptotic accuracy of Efron's bootstrap. Ann. Statist. 9,
1187-1195.
Smid, B. and Starn, AJ. (1975). Convergence in distribution of quotients of order
statistics. Stoch. Proc. Appl. 3,287-292.
Smirnov, N.V. (1935). Ober die Verteilung des allgemeinen Gliedes in der Variationsreihe. Metron 12,59-81.
Smirnov, N.B. (1944). Approximation of distribution laws of random variables by
empirical data. Uspechi Mat. Nauk 10, 179-206 (in Russian).
342
Bibliography
Smirnov, N.V. (1949). Limit distributions for the term of a variational series. Trudy
Mat. Inst. Steklov 25, 1-60. (In Russian). English translation in Amer. Math. Soc.
Transl. (1), 11 (1952), 82-143.
Smirnov, N.V. (1967). Some remarks on limit laws for order statistics. Theory Probab.
Appl. 12,337-339.
Smith, R.L. (1982). Uniform rates of convergence in extreme value theory. Adv. Appl.
Probab. 14,600-622.
Smith, R.L. (1984). Threshold methods for sample extremes. In: Statistical Extremes
and Applications, Ed. J. Tiago de Oliveira, pp. 621-638. Dordrecht: Reidel.
Smith, R.L. (1985a). Maximum likelihood estimation in a class of non-regular cases.
Biometrika 72, 67-92.
Smith, R.L. (1985b). Statistics of extreme values. Proc. 45th Session of the lSI, Vol. 4
(Amsterdam), 26.1.
Smith, R.L. (1986). Extreme value theory based on the r largest annual events. J.
Hydrology 86, 27-43.
Smith, R.L. (1987). Estimating tails of probability distributions. Ann. Statist. 15,
1174-1207.
Smith, R.L. and Weissman, I. (1987). Large deviations of tail estimators based on the
Pareto approximation. J. Appl. Probab. 24, 619-630.
Smith, R.L., Tawn, J.A. and Yuen, H.K. (1987). Statistics of multivariate extremes.
Preprint, University of Surrey.
Sneyers, R. (1984). Extremes in meteorology. In: Statistical Extremes and Applications,
Ed. J. Tiago de Oliveira, pp. 235-252. Dortrecht: Reidel.
Stigler, S.M. (1973). Studies in the history of probability and statistics. XXXII. Biometrika 60, 439-445.
Strasser, H. (1985). Mathematical Theory of Statistics. De Gruyter Studies in Math.
7, Berlin: De Gruyter.
Stute, W. (1982). The oscillation behaviour of empirical processes. Ann. Probab. 10,
86-107.
Sukhatme, P.V. (1937). Tests of significance for sample of the X2 -population with two
degrees offreedom. Ann. Eugenics 8, 52-56.
Sweeting, T.J. (1985). On domains of uniform local attraction in extreme value theory.
Ann. Probab. 13, 196-205.
Teugels, J.L. (1981). Limit theorems on order statistics. Ann. Probab. 9, 868-880.
Thompson, W.R. (1936). On confidence ranges for the median and other expectation
distributions for populations of unknown distribution form. Ann. Math. Statist. 7,
122-128.
Tiago de Oliveira, J. (1958). Extremal distributions. Rev. Fac. Cienc. Univ. Lisboa A
7,215-227.
Tiago de Oliveira, J. (1961). The asymptotic independence of the sample means and
the extremes. Rev. Fac. Cienc. Univ. Lisboa A 8, 299-310.
Tiago de Oliveira, J. (1963). Decision results for the parameters of the extreme value
(Gumbel) distribution based on the mean and standard deviation. Trabajos de
Estadistica 14, 61-81.
Tiago de Oliveira, J. (1984). Bivariate models for extremes; statistical decisions.
In: Statistical Extremes and Applications, Ed. J. Tiago de Oliveira, pp. 131-153.
Dordrecht: Reidel.
Tippett, L.H.C. (1925). On the extreme individuals and the range of samples taken
from a normal population. Biometrika 17,364-387.
Torgersen, E.N. (1976). Comparison of statistical experiments. Scand. J. Statist. 3,
186-208.
Tusmidy, G. (1974). On testing density functions. Period. Math. Hungar. 5, 161-169.
Umbach, D. (1981). A note on the median of a distribution. Ann. Inst. Statist. Math.
33, Ser. A, 135-140.
Bibliography
343
Uzgoren, N.T. (1954). The asymptotic development of the distribution of the extreme
values of a sample. In: Studies in Mathematics and Mechanics. Presented to Richard
von Mises, pp. 346-353. New York: Academic Press.
Vaart, H.P. van der (1961). A simple derivation ofthe limiting distribution function of
a sample quantile with increasing sample size. Statist. Neerlandica 15,239-242.
Walsh, J.E. (1969). Asymptotic independence between largest and smallest of a set of
independent observations. Ann. Inst. Statist. Math. 21, 287-289.
Walsh, J.E. (1970). Sample sizes for appropriate independence of largest and smallest
order statistic. J. Amer. Statist. Assoc. 65, 860-863.
Watson, G. and Leadbetter, M. (1964a). Hazard analysis I. Biometrika 51,175-184.
Watson, G. and Leadbetter, M. (1964b). Hazard analysis II. Sankhya, Ser. A, 26,
101-116.
Watts, V., Rootzen, H. and Leadbetter, M.R. (1982). On limiting distributions of
intermediate order statistics from stationary sequences. Ann. Probab. 10, 653662.
Weinstein, S.B. (1973). Theory and applications of some classical and generalized
asymptotic distributions of extreme values. IEEE Trans. Inf. Theory 19, 148-154.
Weiss, L. (1959). The limiting joint distribution of the largest and smallest sample
spacings. Ann. Math. Statist. 30, 590-593.
Weiss, L. (1964). On the asymptotic joint normality of quantiles from a multivariate
distribution. J. Res. Nat. Bureau Standards 68, Ser. B, 65-66.
Weiss, L. (1965). On asymptotic sampling theory for distributions approaching the
uniform distribution. Z. Wahrsch. verw. Gebiete 4, 217-221.
Weiss, L. (1969a). The joint asymptotic distribution of the k-smallest sample spacings.
J. Appl. Probab. 6,442-448.
Weiss, L. (1969b). The asymptotic joint distribution of an increasing number of sample
quantiles. Ann. Inst. Statist. Math. 21, 257-263.
Weiss, L. (1969c). Asymptotic distributions of quantiles in some nonstandard cases.
In: Nonparametric Techniques in Statistical Inference, Ed. M.L. Puri, pp. 343-348.
Cambridge: Cambridge Univ. Press.
Weiss, L. (1971). Asymptotic inference about a density function at an end of its range.
Nav. Res. Logist. Quart. 18,111-114.
Weiss, L. (1973). Statistical procedures based on a gradually increasing number of order
statistics. Commun. Statist. 2, 95-114.
Weiss, L. (1974). The asymptotic sufficiency of a relatively small number of order
statistics in test of fit. Ann. Statist. 2, 795-802.
Weiss, L. (1976). The normal approximations to the multinomial with an increasing
number of classes. Nav. Res. Logist. Quart. 23, 139-149.
Weiss, L. (1977). Asymptotic properties of Bayes tests of nonparametric hypothesis.
In: Statistical Decision Theory and Related Topics, II, Eds. D.S. Moore and S.S.
Gupta, pp. 439-450. New York: Academic Press.
Weiss, L. (1978). The error in the normal approximation to the multinomial with an
increasing number of classes. Nav. Res. Logist. Quart. 25,257-261.
Weiss, L. (1979a). The asymptotic distribution of order statistics. Nav. Res. Logist.
Quart. 26,437-445.
Weiss, L. (1979b). Asymptotic sufficiency in a class of nonregular cases. Selecta Statistica Canadiana V, 141-150.
Weiss, L. (1980). The asymptotic sufficiency of sparse order statistics in test of fit with
nuisance parameters. Nav. Res. Logist. Quart. 27, 397-406.
Weiss, L. (1982). Asymptotic joint normality of an increasing number of multivariate order statistics and associated cell frequencies. Nav. Res. Logist. Quart. 29,
75-96.
Weissman, I. (1975). Multivariate extremal processes generated by independent nonidentically distributed random variables. J. Appl. Probab. 12,477-487.
344
Bibliography
Author Index
Alam, K., 48
Ali, M.M., 238
Anderson, C.W., 202, 203
Arnold, B.C., 63
B
Bahadur, R.R., 216, 228
Bain, L.J., 204
Balkema, A.A., 146, 148
Barndorff-Nielsen, 0., 198
Barnett, V., 63, 66, 67, 257
Becker, A., 63
Beran, 1., 228
Bennan, S.M., 238
Bernoulli, N., 62
Bhattacharya, R.N., 69, 149,231
Bickel, P.L, 61, 228, 271
Bloch, D.A., 271
Blum, J.R., 104
Boos, D.O., 291
Bortkiewicz, L. von, 62
Brown, B.M., 271
Brozius, H., 82
C
Chernoff, H., 210
Chibisov, D.M., 195,202
D
David, F.N., 205, 226
David, H.A., 63
Davis, C.E., 271
Davis, R.A., 202
Deheuvels, P., 205, 285
Dodd, E.L., 62
Dronskers, J.J., 203
Du Mouchel, W., 291
Dwass, M., 149
Dziubdziela, W., 68
E
Eddy, W.F., 82
Efron, B., 220
Egorov, V.A., 149
346
F
Falk, M., 102, 122, 149, 159, 164, 185,
186, 187, 195, 199,202,203,224,
264,265,271,291,317
Feldman, D., 148
Feller, W., 227
Ferguson, T.S., 103
Finkelstein, B. V., 238
Fisher, R.A., 62, 172,201,202
Floret, K., 71
Frechet, M., 63
Freedman, D.A., 228
G
Galambos, J., 23, 37, 43, 63, 76, 80,
82, 162, 180,201,202,228,235,
238
Gale, J.D., 82
Galvani, L., 81
Gastwirth, J.L., 210, 271
Gather, U., 63
Geffroy, J., 238
Gini, c., 81
Gnedenko, B., 155,201
Goldie, C.M., 202, 204
Gomes, M.I., 286, 290
Gosh, J.K., 149
Gosh, M., 48
Gross, A.J., 204
Guilbaud, 0., 36
Gumbel, E.J., 62, 63, 149,257
H
Haan, L. de, 63, 82, 146, 148, 155, 195,
201, 202, 291
Hajek, J., 58, 61
Haldane, J.B.S., 203
Hall, P., 68, 81, 202, 203, 285, 291
Hall, W.J., 202
Harrel, F.E., 271
Harter, H.L., 62
Has'minskii, R.Z., 274, 298
Author Index
I
Ibragimov, J.A., 103,274,298
Iglehardt, D.L., 148
Ikeda, S., 2, 104, 149, 150, 203
Isogai, T., 81
Ivchenko, G.I., 150
K
Kabanov, Yu., 204
Karr, A.F., 204
Kendall, M.G., 150,226
Kiefer, J., 148,218,228
Kinnison, R.R., 63
Klenk, A., 228
Kohne, W., 71, 149,200
Kolchin, V.F., 150
Kotz, S., 63, 277, 289, 290
Kuan, K.S., 238
Kuo, M., 228
Author Index
L
N
Nadaraya, E.A., 271
Nagaraja, H.N., 63
Nelson, W., 204
Nevzorov, V.B., 149
Niinimaa, A., 81
Nolle, G., 315
Nonaka, Y., 150
Nowak, W., 150
347
Olkin, I., 238
O'Reilley, F.J., 82
p
Quesenberry, c.P., 82
R
Rachev, S.T., 202
Radtke, M., 176, 199, 200, 204
Ralescu, S.S., 149
Ramachandran, G., 227
Rao, R.R., 69, 142,231
Raoult, J.P., 82
Reiss, R.-D., 68, 102, 103, 104, 122,
124, 128, 138, 147, 149, 150, 175,
196, 200, 202, 203, 224, 226, 233,
239, 262, 268, 270, 271, 286, 290,
291,296,317
Renyi, A., 36, 63, 148
Resnick, S.I., 63, 76, 202, 204, 227,
228, 235, 238, 291
Revesz, P., 150
Rice, J., 271
Rogge, L., 148
Rootzen, H., 23, 63, 146, 149, 202, 203
Rosenblatt, M., 82, 271
Rosengard, A., 149
Rossberg, H.J., 43, 149
Author Index
348
Riischendorf, L., 63, 82
Ryzin, J. van, 271
S
Schafer, R.E., 204
Schiipbach, M., 290
Sen, P.K., 148,228
Send1er, W., 104
Sertling, R.J., 104, 150,227,289
Shiryaev, A.N., 204
Shorack, G.R., 150
Sibuya, M., 236, 238
Sidlik, Z., 58, 61
Siddiqui, M.M., 237, 238, 271
Singh, K., 224, 228
Singpurwalla, N.D., 204
Smid, B., 149
Smimov, N.V., 145, 148, 150,271
Smith, R.L., 63, 202, 203, 204, 239,
286, 290, 291
Sneyers, R., 257
Starn, A.J., 149
Stigler, S.M., 148
Strasser, H., 317
Stromberg, K., 21, 57
Stuart, A., 226
Stute, W., 219, 228, 271
Sukhatme, P.V., 36
Sweeting, T.J., 159,202,203
T
Tawn, J.A., 239
Teicher, H., 102
Terzakis, D., 82
Teugels, J.L., 227, 291
Thompson, W .R., 63
Tiago de Oliveira, J., 61, 149,238,239,
289, 291
Tippett, L.H.C., 62,172,201,202
Torgersen, E.N., 317
Tricomi, F.G., 150
Y
Yackel, J.W., 81
Yamato, H., 271
Yang, S.-S., 271
Yuen, H.K., 239
V
Vaart, H.P. van der, 148
W
Wald, A., 273
Walsh, J.E., 148, 149
Watson, G., 271
Watts, V., 202
Weinstein, S.B., 204
Weiss, L., 2, 39, 103, 149, 150,203,
238,290,296,311,313,317
Weissman, I., 291
Weller, M., 317
Wellner, J .A., 86, 104, 150, 202
Welsh, A.H., 285, 291
Wilks, S.S., 57, 59, 63
Winter, B.B., 271
Witting, H., 274, 315
Wolfowitz, J., 104
Wu, C.Y., 195,202
Z
Zahedi, H., 63
Zolotarev, V.M., 202
Zwet, W.R. van, 227
Subject Index
[AbbL: o.s.
order statistic1
A
ADO (software package), 7
Annual maxima method, see Subsample
method
Associated LV.'S, 238
Asymptotic distribution of
central o.s.'s, 145-146; see also
Asymptotic normality
extreme o.s.'s
k largest o.s.'s, 177-179
kth largest o.s.'s, 161-163
maxima, see (univariate, multivariate) Extreme value d.f.
minima, 24, 162
intermediate o.s.'s, 164, 195; see also
Asymptotic normality
Asymptotic independence of
groups of o.s. 's, 75, 121-123, 297
marginal maxima, 234-237
ratios of o.s. 's, 149
spacings, 201
Asymptotic normality of
central o.s. 's, multivariate, 229-232
central o.s. 's, univariate
strong, 22, 110-114, 131-142
weak, 108-110, 129
intermediate o.s.'s
strong, 164
weak, 109
kernel estimator, 263-264
linear combination of o.s. 's, 209-211,
215-216, 227
multinomial distributions, 150
B
Bahadur approximation, 216-220
Bandwidth, 249
Beta
function, 22
LV., 22
Bonferroni inequality, 79, 102,
233
Bootstrap
distribution
of linear combination of o.s. 's,
228
of sample quantile, 222-226
smooth, of sample quantile, 265268
error process, 224, 267
Borel set, 8
350
Brownian
bridge, 150
motion, 224
C
Cauchy distribution, 49, 199
Central limit theorem
Lindeberg-Uvy-Feller, 210
multi-dimensional, 231
Central o.s., see (central) Sequence
X2 distance, see Distance
distribution
central, 313
noncentral, 315
Comparison of models, 275-276, 292299, 317
Componentwise ordering, see (multivariate) O.s. 's
Concomitant, 66
Conditional
density, 52
distribution, 51
of exceedances, 54-55, 61, 78
of Li.d. random variables given the
o.s., 60
of o.s., see (univariate) O.s.'s
of rank statistic given the o.s., 6061
independence under Markov property,
53,61
Confidence procedure
bootstrap, 225-226
for quantile, 247
Convex hull of data, 66, 81-82
D
Data
temperature (De Bilt), 257-260
Venice sea-level, 286
Deficiency
E-, of models, 295, 299
of estimators, 263
~-monotone, 77
Density quantile function, 243, 253
Dependence function, 80
Pickands estimator of, 80; see also
Kernel estimator
Subject Index
D.f., see Distribution function
Dirichlet distribution, 59
Distance
X2_, 98-102, 328-330
between induced probability measures, 102
between product measures, 100
Hellinger, 98-102, 328
between induced probability measures, 101
between product measures, 100
Kolmogorov-Smirnov, 2
Kullback-Leibler, 98-100, 328
between product measures, 100
L,-, 94, 326
variational, 94, 326
between induced probability measures, 101
between product measures, 97-98,
327, 328-330
Distribution function (d. f)
continuity criterion, 16
degenerate, 14, 76
endpoints of, 8
multivariate, 77-78
weak convergence, 2, 194-195
Domain of attraction, see (univariate) Extreme value d.f.
Dvoretzky, Kiefer, Wolfowitz inequality,
104
E
Edgeworth expansion, 91, 140
inverse of, 141
Efficiency, 273-275, 279, 283, 284
second order, see Deficiency, of estimators
Estimator
Bayes, 274
equivariant under translations, 284
kernel, see Kernel estimator
maximum likelihood, 259-260, 277279,298
minimum distance, 259
nearest neighbor density, 81
orthogonal series, 269-270, 315
Pitman, 274, 298
quick, of location- and scale parameters, 212-213, 289
351
Subject Index
randomized, 274; see also Sample,
median; Sample, q-quantile
of shape parameter, 277-279, 281283, 284-286
of tail index, 279-281, 283-284, 284286
Exceedances, see also (truncated, empirical) Point process
multivariate, 67-68, 81
univariate, 54, 190-193
Expansion of finite length, 90
of d.f.'s, 93
of distributions of
central o.s. 's, several, 131-135
central o.s.'s, single, 114-121, 138140, 147-148; see also GramCharlier series
convex combination of o.S. 's, 213215
k largest o.s.'s, 182, 184
kth largest o.s. 's, 184
maxima, 172-176
of moments of o.S. 's, 207-208
of normal distributions, 90-91, 102
of probability measures, 91-93
of quantiles of o.S. 's, 208-209, 226
Expected loss, see Risk
Exponential
d.f., 13, 42; see also Generalized Pareto d.f.
model, 282-283
Exponential bound theorem for
LLd. random variables, 83-84
kernel estimator, 262
o.s.'s, 84-86, 144-145
sample d.f., 218-219
sample q.f., 87-89
Extreme o.s. 's, see (extreme) Sequence,
maximum, and minimum
Extreme value d.f., multivariate, 75-77
max-stability of, 77
Pickands representation of, 76-77,
80
Extreme value d~f., univariate, 23, 24;
see also Fn!chet, Gumbel, and
Weibull d.f.
density of, 152
domain of attraction of, 24, 154-156,
157, 180, 194
max-stability of, 23
F
Finite expansion, see Expansion of finite
length
Fisher information, 282
matrix, 276-277
Fisher-Tippett asymptote, see (univariate)
Extreme value d.f.
type I, see Gumbel d.f.
type II, see Frechet d.f.
type III, see Weibull d.f.
Fourier expansion, 269, 315
Frechet
d.f., 23; see also Extreme value d.f.
illustrations, 26, 153
mode of, 153
model, 276, 279
multivariate, model, 282
semiparametric, type model, 279-280,
283-284
G
Galton difference problem, 63
Gamma
function, 22
r.v., 39-40, 59
moments of, 181-182
Generalized Pareto
density, 157
illustrations, 196-198
d.f.,42
characterization of, 37, 43, 185
type I, see Pareto d.f.
type II, 42, 196
type III, see Exponential d.f.
Gram--Charlier series, 226
Gumbel
d.f., 23; see also Extreme value d.f.
illustrations, 25, 26
method, see Subsample method
model, 276-279
352
H
Hellinger distance, see Distance
Hill estimator, 284-285
Homogeneous Poisson process, see Point
process
I
Independent not necessarily identically
distributed (i.n.n.i.d.) r. v.'s
distribution of the o.s. of, 36
maximum of, 21
Informative, more, 294
Intensity measure, see Point process
Inverse, generalized, 318-320; see also
Q.f.
K
Kernel
Epanechnikov, 253
method, 251-252
Kernel estimator of
density, 253, 269
illustrations, 258-259
density quantile function, 253, 260262
dependence function, 239
d.f., 252-253, 262-264
inverse of: illustrations, 254-255
hazard function, 271
q.f., 252, 260--262, 264-265, 286-289
illustrations, 254-255, 288
Kolmogorov-Smirnov
distance, see Distance
test, see Test
L
Leadbetter's conditions, 202
Lebesgue's differentiation theorem, 71
L-statistic, see (linear combination of)
O.s.'s
Subject Index
M
Malmquist's result, 37-38
Marginal ordering, see Multivariate o.s. 's
Markov
kernel, 34, 293
distribution of, 34, 50, 293
property
conditional independence under, 61
of O.s. 's, 54
Maximum (also: sample maximum)
multivariate, 65
density of, 69
d.f. of, 68
univariate, 12, 21
density of, 22
dependence of, and minimum, see
Asymptotic independence
d.f. of, 20
with random index, 198,280--281
Maximum likelihood, see Estimator
Max-stability, see Extreme value d.f.
Mean value function, see Point process
Median
multivariate, 66
univariate, 49
Minimax criterion, 274
Minimum (also: sample minimum)
multivariate, 65
density of, 69
d.f. of, 69
univariate, 12
density of, 22
d.f. of, 20
Mises, von
parametrization
of extreme value d.f. 's, 24-26
of generalized Pareto d.f.'s, 197-198
of Poisson processes, 194
-type conditions, 159-160, 199-200
Moderate deviation, see Exponential
bound theorem
Moving scheme, 249-250
illustration, 250
N
Newton-Raphson iteration, 259, 278
Normal
approximation, see Asymptotic
normality
353
Subject Index
comparison lemma, 149
distributions
expansion of, see Expansion of finite
length
moments of, 130-131
multivariate, 129-130, 146
univariate, 13
model , multivariate, 310-315
Normalization of maxima, 23, 156, 161,
200
nonlinear, 204
of nonnal r.v.'s, 160-161
o
Ordered distance r. v ., 68
Ordering, total-I\J, 66-68
Order statistics (o.s. 's), multivariate, 65
density of, 71, 73-74
d.f. of, 69-70, 229-232, 232-237
I\J-, see Ordering
Order statistics (o.s. 's), univariate, 12
of binomial r.v.'s, 141-142
central, see (central) Sequence
conditional distribution of, given
o.s. 's, 52-54
convex combination of, 55-56
density of single, 21, 33
d.f. of, 20, 57
mode of, 49
unimodality of, 48-49
of discrete r.v.'s, 35-36, 139-142
extreme, see (extreme) Sequence
independence of, from underlying d.f.,
123-128
intennediate, see (intennediate) Sequence
joint density of
absolutely continuous case, 27-28,
30-32
continuous case, 33
discontinuous case, 35-36, 58
linear combination of, 56, 209-216,
227
local limit theorem for, 142-144
Markov property of, 54
moments of
exact, 44-45, 59-60
inequalities for, 45-47, 86-87
354
Quantile
function (q.f.), 13, 19
continuity criterion, 320
estimation of, 286-289
parametric estimation of, 256; see
also Kernel estimator
weak convergence, 19
process, 150, 264
smooth, 264
transformation
multivariate, 81
of o.s.'s, 15, 17-18, 76
univariate, 14, 17
Quasi-quantile, 250-251
R
Ranking, see Ordering
Rank statistic, 55, 60
Regression, linear, 314
Risk, 273
Bayes, 274
S
Sample
d.f., 13,59
oscillation of, 218-219
maximum, see Maximum
median, multivariate, 66
median, univariate, 14
randomized, 50, 60, 246
minimum, see Minimum
q.f., 13
illustrations, 250, 254-255,
288
maximum deviation of, 87-88
oscillation of, 88-89 261
smooth, see Kernel estimator
q-quantile, 14, 247-248
randomized, 247, 268
Scheffe lemma, 95-97, 325
Sequence
of lower or upper extremes,
12
of o.s.'s
central, extreme or intermediate,
12
Subject Index
Skewness of extreme value density, 2526
Smoothing technique, see Kernel method
Spacings, 29, 36-37, 147,201,212,
227-228
Strong convergence of unimodal probability measures, 103
Subsample method, 165, 176, 185
Sufficiency, 294-295; see also Deficiency
approximate, 295
Blackwell-, 293-295
Sukhatme's result, 36
Sum of extremes, 227
Survivor function, 69, 79, 234
Sweeting's result, 159
Systematic statistic, 211-213
T
Tail
equivalence of
densities, 157-159
d.f. 's, 156
index, 204, 280-286
Test
X2 -, 313
Kolmogorov-Smirnov, 313
of quantiles, 244-246, 268-269
Threshold
non-random, 191, 193
random, 55
Transformation
of models, see Comparison of models
technique, see Quantile, transformation; Probability integral transformation
theorem for densities, 29, 57
Trimmed mean, 68, 211, 251
Truncation
of d.f., 52, 57, 194
of point process, 191, 193
Unbiased estimation
expectation, 273
median, 50-51, 247, 248, 268, 274
Subject Index
Unimodal density, 48
mode of, 48
strongly, 48
V
Variational distance, see Distance
355
W
Weibull
d.f., 23, 199; see also Extreme value
d.f.
illustrations, 26, 27, 154, 258-259
mode of, 154
model, 317