Você está na página 1de 27

A Guide to Statistical Methods Useful in Hydrology

Keith A. Cherkauer

Revised
March 8th, 2007
Created by Keith A. Cherkauer on April 16th, 1997
Revised March 8th, 2007

K. A. Cherkauer
Department of Agricultural and Biological Engineering
Purdue University
225 S. University St.
West Lafayette, IN 47907
cherkaue at purdue.edu
Statistics Notes

TABLE OF CONTENTS

1) SAMPLE STATISTICS 1

a) Probability Density Function (PDF) 1

b) Cumulative Distribution Function (CDF) 1

c) Independence 1

d) Expectations 1

e) Moments 2

f) Moment Generating Functions 2

g) Cumulants 3

h) Transformation of Variables 3

2) PROBABILITY DISTRIBUTIONS 5

a) Extreme Value Family 5


ii) Type 1 or Gumbel Distribution 5
iii) Type 2 or Frechet Distribution 5
iv) Type 3 Distribution 6
v) General Extreme Value (GEV) 6
vi) Weibull Distribution 6

b) Gamma Family (Pearson): log Pearson 6


ii) Exponential distribution 6
iii) Gamma distribution 7
iv) Pearson Type 3 8
v) log Pearson Type 3 8

c) Log-Normal Family 8
i) Normal or Gaussian Distribution 8
ii) 2 Parameter Log Normal Distribution 8
iii) 3 Parameter Log Normal Distribution 9

d) Discrete Distributions 9
i) Binomial Distribution 9
ii) Poisson Distribution 9

e) Other Useful Distributions 9


i) T-Distribution 9
ii) Chi-squared Distribution 9
iii) F-Distribution 10

3) PARAMETER ESTIMATION 11

i
K. A. Cherkauer - Purdue University

a) Maximum likelihood 11

b) Method of Moments 11

4) HYPOTHESIS TESTING 12

a) Test Basics 12

b) Types of Statistical Tests 12

5) REGRESSION ANALYSIS 15

b) Ordinary Least Squares 15

c) General Least Squares 16

d) Multiple Regression 16

6) TREND ANALYSIS 17

7) TIME SERIES ANALYSIS 18

b) Stationary Time Series 18

c) Correlation 18

d) Storage-Related Statistics 19

e) Time-Series Models 19

8) PROBABILITY OF EXTREMES 21

9) ANALYSIS OF VARIANCE 21

a) ANOVA (ANalysis Of VAriance) 21

b) Regression Approach to ANOVA 21

c) Standard Regression 22

10) BIBLIOGRAPHY 23

ii
Statistics Notes

Statistics Notes

1) Sample Statistics
a) Probability Density Function (PDF)
i) If ƒ(X) is a PDF then ƒ(X) ≥ 0 for all X ∈ ℜ, and

ii) ∫ −∞
f ( X )dX = 1 , and
b
iii) P(a < X < b) = ∫ f ( X )dX .
a
b) Cumulative Distribution Function (CDF)
i) The cumulative distribution F(X) of a continuous random variable X with density function ƒ(X) is given
x
by F ( X ) = P( X ≤ x) = ∫ f (t )dt , for -∞ < x < ∞.
−∞
c) Independence
i) Let X and Y be two random variables, discrete or continuous, with joint probability distribution ƒ(X,Y),
and marginal distributions g(X) and h(Y), respectively. The random variables X and Y are said to be
independent if and only if ƒ(X,Y) = g(X)h(Y), for all (x,y) within their range. Or events A and B are
independent if the probability of A occurring after B has occurred, P(A|B), is equal to the probability of A
occurring, P(A).
d) Expectations

i) The mean or mathematical expectation of a random variable X with PDF ƒ(X) is E ( X ) = ∫ xf ( x)dx ,
−∞
N
or E ( X ) = ∑ xi f ( xi ) , if the function is discrete.
i =1
N
1
ii) Mean ≡ E(X) = µx. The sample mean is x=
N
∑x
i =1
i .

1 N
iii) Variance ≡ E[X-E(X)] = σ . The sample variance is S =
2 2

N − 1 i =1
( xi − x ) 2 .
2

iv) Covariance ≡ COV(XY) = E{[X-E(X)][Y-E(Y)]}= σxy. The sample covariance is


1 N
S xy = ∑ ( xi − x )( yi − y ) .
N i =1
v) Linear combinations of expectations:
• E (aX ± b) = aE ( X ) ± b
• E [g ( X ) ± h( X )] = E[g ( X )] ± E [h( X )]
• E [g ( X , Y ) ± h( X , Y )] = E[g ( X , Y )] ± E[h( X , Y )]
• E ( XY ) = E ( X ) E (Y )
vi) For an unbiased estimator E(θ’) = θ, where θ’ is the sample estimator, and θ is the population parameter.
For example: if S2 is the unbiased estimator of σ2, then E(S2) = σ2.
Example: Show that the sample variance, S2, is an unbiased estimator of the population variance, σ2: i.e.
E(S2) = σ2.
 1 N 
E (S 2 ) = E ∑
 N − 1 i =1
( xi − x ) 

1
K. A. Cherkauer - Purdue University

N N

∑ i ( x − x ) = ∑ [(xi − µ ) − (x − µ )]2
 i =1 i =1
 N N
 = ∑ ( xi − µ ) − 2( x − µ )∑ ( xi − µ ) + N ( x − µ )
2 2
 i =1 i =1

[ ] 1  2
N N N
= ∑ E ( xi − µ ) − 2∑ E [( xi − µ )( x − µ )] + ∑ E ( x − µ ) 
2
E S2
N − 1  i =1 i =1 i =1 
N N

∑ E ( x i − µ ) 2
= ∑ σ x 2 = Nσ x 2 = Nσ 2
i =1 i =1

N N
σ2
∑ E ( x − µ ) = ∑ σ x = N σ x = N =σ 2
2 2 2

 i =1 i =1 N
 N N
  x + x 2 + ... + x i + ...x N 
2∑ E[( x i − µ )( x − µ )] = 2∑ E ( x i − µ ) 1 − µ 
 i =1 i =1   N 

 N
= ∑ E{( x i − µ )[( x1 − µ ) + ( x 2 − µ ) + ... + ( x i − µ ) + ...( x N − µ )]}
2
 N i =1

 2 N
 = ∑ E[( x1 − µ )( xi − µ ) + (x 2 − µ )(x i − µ ) + ... + (x i − µ )( xi − µ ) + ...(x N − µ )(x i − µ )]
N i =1

 2 N 2 N
= ∑ E ( x i − µ ) = ∑ σ x = Nσ 2 = 2σ 2
2 2 2

 N i =1 N i =1 N

Note: σ x2 = σ 2 , for i = 1, 2, …, N; and σ x2 = σ 2 N .


i

[ ]
E S2 =
1
[
Nσ 2 − 2σ 2 + σ 2 =
( N − 1)σ 2 = σ 2
]
N −1 N −1

e) Moments
µ r ′ = E ( X r ) = ∫ x r f ( x)dx .

i) The rth moment for the probability density function f(x), is
−∞
(1) The zero moment, µ0’ = 1.
(2) The first moment, µ1’ = E(X) = µx.
(3) The second moment, µ2’ = E(X2) = µx2 + σx2.
ii) The rth central moment, or moment taken around the mean, for the p.d.f. f(x) is

µ r C = E[ X − E ( X )]r = ∫ ( X − µ X ) r f ( X )dX .
−∞
(1) The first central moment, µ1C = 0.
(2) The second central moment, µ2C = E[X – E(X)]2 = σx2.
(3) The third central moment, µ3C = E[X – E(X)]3 = γσx3.
f) Moment Generating Functions
i) The moment generating function of a probability distribution function, ƒ(x), is:

M X (t ) = E (e xt ) = ∫ e xt f ( x)dx .
−∞

µr′ =
d r M X (t )
ii) To find for the rth moment, µr’, solve .
dt r t =0
(1) Example: Find the first and second moments for an exponential distribution.

2
Statistics Notes

1  x − x0 
The PDF is f (X ) = exp−   .
β  β 
First solve for the moment generating equation,
 x − x0 
− 
∞ ∞ e xt  β 
M x (t ) = E (e ) = ∫ e f ( X )dx = ∫
xt xt
e dx
x0 x0 β
1 ∞  xtβ − x + x0 
β∫
M x (t ) = exp   dx
x0
 β 
exp( x0 β ) ∞   (1 − tβ )x 
=
β ∫
x0
exp − 
  β
 dx


exp( x0 β )   β    (1 − tβ )x 
= −   exp −  
β   1 − tβ     β   x0
exp( x0 β )   (1 − tβ )x0  e tx0
= exp −   =
1 − tβ   β   1 − tβ
Now solve for the first moment:
dM x(t ) d  e tx0   βe tx0 x0 e tx0 
=   =  +
dt t = 0 dt  1 − tβ  t =0  (1 − tβ )
2
(1 − tβ )  t =0

= β + x 0 = µ1
the second moment:
d 2 M x(t ) d  βe tx0 x0 e tx0 
=  + 
dt 2 t =0 dt  (1 − tβ )2 (1 − tβ )  t = 0
 2 β 2 e tx0 2 x0 βe tx0 x0 2 e tx0 
= + + 
 (1 − tβ )
3
(1 − tβ )2 (1 − tβ )  t =0

= 2 β 2 + 2 x0 β + x0 = µ 2
2

Note: the variance or second central moment is:


′ ′
2
σ = µ = E[ X − E ( X )]
2 C 2
= µ 2 −  µ1 
2
 
= 2 β 2 + 2 x 0 β + x 0 − (β + x 0 )
2 2

= β2
g) Cumulants
i) A cumulant is a moment that is invariant with changes in the mean.
h) Transformation of Variables
i) Suppose that X is a continuous random variable with PDF ƒ(X). Let Y = u(X) define a one-to-one
correspondence between the values of X and Y, so that the equation y = u(x) can be uniquely solved for x
∂x
in terms of y, say x = w(y). Then the PDF of Y is g ( y ) = f [ w( y )] .
∂y
Example: Log-Normal distribution

3
K. A. Cherkauer - Purdue University

1  1  y − µ 2 
φ (Y ) = exp −    , where y = ln(x).
y

2πσ y 2  2  σ y  

 
dy 1  1  ln( x) − µ  2  d [ln( x)]
f ( X ) = φ (Y ) = exp −   
y

dx 2πσ y2  2  σy   dx
  

1  1  ln( x) − µ 
2

= exp −   
y

x 2πσ y
2  2 σy  
  

4
Statistics Notes

2) Probability Distributions
a) Extreme Value Family
i) The extreme value distribution comes from analysis of extreme events. Given a data set XI {X1, X2, …,
XN}, the extreme value will be Mi = max(Xi). If the set Xi is for example daily streamflow, then Mi is the
maximum annual streamflow. Given multiple years of data, the extreme values will gradually approach a
distribution that is a member of the extreme value family. If the Xi’s are from an extreme value
distribution, then the Mi’s will have the same extreme value distribution. The branch of the family is
determined by the value of k.

X Extreme Value Family


30
25
20
15
10
5
0
-2 0 2 4 Y1 6

Extreme value family as functions of the Type 1 reduced variate, y1, by the relation
( )
x = u + α 1 − e − ky1 k : EV1, k=0 (solid); EV2, k<0 (long dash); EV3, k>0 (short dash).
Increasing |k| increases curvature.

ii) Type 1 or Gumbel Distribution


(1) If k = 0, then it said to have a Type 1 (EV1), or Gumbel distribution. The EV1 will appear as a
straight line on extreme value plotting paper.

(2) The PDF is f ( x1 ) =


1
α
[
exp − ( x1 − u ) / α − e −( x1 −u ) / α . ]
(3) The CDF is F ( x1 ) = exp − e 1 ( .
− ( x −u ) / α
)
(4) Where u is the location parameter, and α is the scale parameter.
(5) Mode is at x1 = u.
(6) Mean = µ1′ = E ( x1 ) = u + αC = u + 0.5772α .
π 2α 2
(7) Variance = σ 2 = E[ x1 − E ( x1 )]2 = .
6
(8) Skewness = g = 1.14.
(a) The standardized variate is y1 = (x1 – u)/α, with PDF, g(y1) = exp(-y1 – exp(-y1)); and DF, G(y1)
= exp(-exp(-y1)).
(b) NOTE: extreme value plotting paper uses plots values of y on a linear scale vs. values of x,
where x is the original data, also on a linear scale.
iii) Type 2 or Frechet Distribution
(1) If k < 0, then it said to have a Type 2 (EV2), or Frechet distribution. If X has an EV2 distribution,
then log X has an EV1 distribution, so the EV2 is also referred to as the log Gumbel distribution.
The EV2 will appear as a line curving up on extreme value plotting paper.
(2) The CDF is {
F ( x2 ) = exp − [1 − k ( x2 − u ) / α ]
1/ k
}, where k < 0, and (u + α/k) ≤ x ≤ ∞.
2

5
K. A. Cherkauer - Purdue University

1 / k −1

(3) The PDF is f ( x2 ) =


1  x2 − u 
α
1 −
α
k {
exp − [1 − k ( x2 − u ) / α ]
1/ k
}.

(4) Where u is the location parameter, α is the scale parameter, and k is the shape parameter.
(5) The standardized variate is y2 = 1 - k(x2 – u)/α, 0 ≤ y2 ≤ ∞; with PDF, g(y2) = [(-y21/k-1)exp(-y21/k)]/k;
and DF, G(y2) = exp(-y21/k).
(6) The values of G(y2) depend on k, so values must be tabulated for different values of k.
iv) Type 3 Distribution
(1) If k > 0, then it said to have a Type 3 (EV3) distribution. If X has an EV3 distribution, then –x is said
to have a Weibull distribution. The EV3 will appear as a line curving down on extreme value
plotting paper.
(2) The CDF is {
F ( x3 ) = exp − [1 − k ( x3 − u ) / α ] }, where k > 0, and -∞ ≤ x ≤ (u + α/k).
1/ k
3
1 / k −1

exp{− [1 − k ( x − u ) / α ] }.
1  x3 − u 
f ( x3 ) = 1 −
1/ k
(3) The PDF is k
α α
3

(4) Where u is the location parameter, α is the scale parameter, and k is the shape parameter.
(5) The standardized variate is –y3 = 1 - k(x3 – u)/α, -∞ ≤ y3 ≤ 0; with PDF, g(y3) = {(-y31/k-1)exp[-(-
y3)1/k]}/k; and DF, G(y2) = exp[-(-y3)1/k].
(6) The values of G(y2) depend on k, so values must be tabulated for different values of k.
v) General Extreme Value (GEV)
(1) The GEV has the same form as the EV2, or EV3, except that k is unrestricted. It is used when the
type of extreme value function needed is unknown. When fitted to the data, the GEV returns an
estimated value of k. If k is close to 0, the data should be reanalyzed with an EV1 distribution, since
the EV1, and EV2 distributions will approach that of the EV1 as k  0.
 1 − e − ky1 
(2) The variate of the GEV, x, is related to the variate of the EV1, y1, by x = u + α   .
 k 
vi) Weibull Distribution
(1) As noted before the Weibull distribution is a variation of the EV3 for minimum values. If X has an
EV3 distribution, then –X has a Weibull distribution. This is an important distribution, with useful
properties, therefore it has been included as a separate category.
k −1
k x   x k 
(2) The PDF is f ( x ) =   exp −    .
α α    α  
  x k 
(3) The CDF is F ( x ) = 1 − exp  −    .
  α  
(4) Where x > 0; α, k > 0.
 1
(5) Mean = µ1′ = E ( x) = αΓ1 +  .
 k
  2    1  2 
(6) Variance = σ = E[ x − E ( x )] = α Γ1 +  − Γ1 +    .
2 2 2

  k    k  
(7) If k = 1, then the distribution becomes exponential.
(8) If X has a Weibull distribution, then Y = -ln(X) has a Gumbel (EV1) distribution..

b) Gamma Family (Pearson): log Pearson


i) The Pearson Type 3 distribution has three parameters: x0, the location parameter, which determines the
lower bound of the distribution; β, the scale parameter; and γ, shape parameter. There are two special
cases, when x0 = 0, the gamma distribution; and γ = 1, the exponential distribution.
ii) Exponential distribution

6
Statistics Notes

1
(1) The PDF is f ( x) = e − ( x − x0 ) / β .
β
(2) The CDF is F ( x) = 1 − e − ( x − x0 ) / β .
(3) Mean = E ( x ) = µ1′ = x0 + β .
(4) Variance = E[x – E(x)]2 = µ2 = β 2.
(5) Skewness = g = 2.
(6) The standardized variate is y = (x – x0)/β, with PDF, g(y) = e-y; and CDF, G(y) = 1 – e-y.

Exponential Distribution

10
8

6
4
2

0
0 0.1 0.2 0.3 0.4 0.5 0.6

Exponential distribution: β = 10 (solid), β = 5 (dashed)

iii) Gamma distribution


x γ −1e − x / β
(1) The PDF is f ( x) = .
β γ Γ(γ )
x x γ −1e − x / β
(2) The CDF is F ( x) = ∫ dx .
0 β γ Γ(γ )
(3) Mean = E(x) = µ1 = βγ. ’

(4) Variance = E[x – E(x)]2 = µ2 = β 2γ.


(5) Skewness = g = 2/γ1/2.
y γ −1e − y y y
γ −1 − y
e
(6) The standardized variate is y = x/β, with PDF g ( y) = ; and CDF, G ( y ) = ∫ .
Γ(γ ) 0 Γ(γ )

Gamma Distribution

0.6
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5

7
K. A. Cherkauer - Purdue University

Gamma distribution with γ = 1.2 (solid), and γ = 1.5 (dashed)

iv) Pearson Type 3

(1) The PDF is f ( x) =


( x − x0 ) e − ( x − x ) / β
γ −1 0

.
β γ Γ(γ )

(2) The CDF is F ( x) = ∫


x ( x − x0 )γ −1 e −( x − x ) / β dx .
0

x0 β γ Γ(γ )
(3) Mean = E(x) = µ1’ = x0 + βγ.
(4) Variance = E[x – E(x)]2 = µ2 = β 2γ.
(5) Skewness = g = 2/γ1/2.
(6) The standardized variate is the same as that for the gamma distribution.
v) log Pearson Type 3
(1) Use Z = ln(x) for a 3 parameter log Pearson Type 3 of x, or Z = ln(x-x0), for a more general 4
parameter log Pearson Type 3 distribution of x. For the 3 parameter log Pearson Type 3:
(ln x − z 0 )η −1 e − (ln x− z ) / λ
0

(2) The PDF is f ( x) = .


xλη Γ(η )
(3) Mean = E(z) = µ1’ = z0 + λη.
(4) Variance = E[z – E(z)]2 = µ2 = λ2η.
(5) Skewness = g = 2/η1/2.
c) Log-Normal Family
i) Normal or Gaussian Distribution
1  1  x − µ 2 
(1) PDF: f ( x) = exp −   
2πσ 2  2  σ  

(2) If Y = X1 + X2 + … + XN, and each X ∈ N(µ,σ2), then µy =


1
(µ1 + µ 2 + ... + µ N ) = µ , and
N
σy 2 1
N
(
= σ 1 + σ 2 + ... + σ N 1 =
2 2 2 σ2
N
. )
(3) The moments µ and σ2 represent the normal distributions natural parameters, and can be
N
1
approximated using the sample mean, and variance: x=
N
∑x
i =1
i , and

1 N
S x2 = ∑
N − 1 i =1
( xi − x ) 2 . These equations are also the maximum likelihood estimators for the
normal distribution.
ii) 2 Parameter Log Normal Distribution

1  1  ln( x) − µ 
2

f ( x) = exp −    .
y
(1)
 2 σy  
x 2πσ y
2
  
(2) This distribution may be used if data show a positive skew. Skewness can be determined from the
CV = σ µ , as γ x = 3CV x + CV x . As CV  0, the log-normal
3
coefficient of variation,
distribution approaches the normal distribution.
(3) Distribution parameters estimated by the method of moments:
1
  σ x2  2 σ2
σ y = ln1 + 2  , and µ y = ln (µ x ) − y .
  µ x  2

8
Statistics Notes

iii) 3 Parameter Log Normal Distribution

1  1  ln( x − a ) − µ 
2

f ( x) = exp −   .
y
(1)
   σ  
( x − a) 2πσ y 2
2
 y  
(2) Sometimes subtracting off a lower bound parameter, a, will sometimes make Y = ln(X) normally
distributed.
(3) The method of moments is inefficient for the LN3 distribution. A simple and efficient method to
x1 x N − x median
2

estimate the parameters makes use of the quantile lower-bound estimator: aˆ = .


x1 + x N − 2 x median
One the lower bound has been estimated, µy and σy, can be found using the sample mean and
variance calculations from y = ln(x-a). This method compares favorably with the maximum
likelihood estimators for LN3.
d) Discrete Distributions
i) Binomial Distribution
(1) The definition of a binomial coefficient is
 n  n(n − 1)(n − 2)...(n − k + 1) n!  n 
  = = =  .
k  k! k!(n − k )!  n − k 
(2) A Bernoulli trial can result in a success with probability p, and a failure of probability q = 1-p. The
probability distribution of the binomial random variable X (the number of successes in n independent
n
trials, is b(n, p ) =   p x q n − x , x = 0, 1, 2, …, n.
 x
(3) The mean is µ = np.
(4) The variance is σ2 = npq.
ii) Poisson Distribution
(1) The probability distribution of the Poisson random variable X, representing the number of outcomes
occurring in a given time interval or specified region denoted by t, is given by:
e − λt (λt )
x
p ( x : λt ) = , x = 0, 1, 2, …; where λ is the average number of outcomes per unit time or
x!
region.
(2) A Poisson process:
(a) Has no memory. Outcomes occurring in one time interval or region are independent of outcomes
in any other time interval region.
(b) Probability of an outcome happening during a time step or in a region is proportional to the
length of the time step, or size of the region, not on anything in another time step or region.
(c) The probability that more than one outcome will occur in such a short time interval or fall within
such a small region is negligible.
(3) The mean and variance are µ = σ2 = λt.
e) Other Useful Distributions
i) T-Distribution
X −µ Z
(1) The T statistic is T= = , which is similar to the normally distributed Z statistic,
S N Vν
except that the sample estimated standard deviation, S, is used instead of the population standard
deviation, σ. The variable V is a chi-squared random variable with ν degrees of freedom.
Γ[(ν + 1) / 2]  t 2 
− (ν +1) / 2

(2) The distribution, for ν degrees of freedom, is given by h(t ) = 1 +  ,


Γ(ν 2 ) πν  ν 
-∞ < t < ∞.
ii) Chi-squared Distribution

9
K. A. Cherkauer - Purdue University

(1) The χ2 statistic is χ2 =


 X −µ
=
(n − 1)S 2 , where X is a normally distributed random
2


 σ  σ2
variable with population mean = µ and variance = σ2, and estimated variance S2.
(2) The distribution function for ν degrees of freedom, is given by
 1
 ν2 xν 2 −1 − x 2
e , x>0
f ( x) =  2 Γ(ν 2 )
0,
 elsewhere
(3) Mean = µ = ν
(4) Variance = σ2 = 2ν
(5) If Vi ∈ { V1, V2, …, VN }, and Vi has a chi-squared distribution, then the sum of all Vi’s also has a
chi-squared distribution.
iii) F-Distribution
U ν1
(1) The F statistic is F= , where U and V are independent random variables having chi-squared
V ν2
distributions with ν1 and ν2 degrees of freedom.
(2) The distribution, with ν1 and ν2 degrees of freedom, is given by
 Γ[(ν 1 + ν 2 ) 2](ν 1 ν 2 )ν1 2 f ν1 2−1

h( f ) =  Γ(ν 1 2 )Γ(ν 2 2 ) (1 + ν 1 f ν 2 )(ν1 +ν 2 ) / 2 , 0 < f < ∞ .
0,
 elsewhere
1
(3) f1−α (ν 1 ,ν 2 ) =
fα (ν 2 ,ν 1 )

10
Statistics Notes

3) Parameter Estimation
a) Maximum likelihood
i) Maximum likelihood estimates a distributions parameters by finding the values for them most likely to
produce the sample data.
ii) The likelihood function for the p.d.f. f(X/A) is: L( X / A) = ∏ f ( X / A) ,
where X is a set of random number to be described (X1, X2, … XN), and A is the set of distribution
parameters (A1 = µ, A2 = σ2, …).
iii) The parameters are found by maximizing the likelihood function with respect to the various parameters:
dL( X / A) dL( X / A)
= 0, = 0 , …, and then solving for those parameters.
dA1 dA2
iv) In some instances it may be easier to find the parameters that maximize the log likelihood equation,
ln[L(X|A)]. The values of A that maximize the likelihood function, will also maximize the log likelihood.
v) Example: Find the maximum likelihood estimators of the normal distribution

( ) [( xi − µ ) σ ]2
N N 1
1 −
L X / µ ,σ 2 = ∏ f ( xi ) = ∏ e 2

i =1 i =1 2πσ 2
∑ [( xi − µ ) σ ]2
1
1 −
=
( 2πσ )
2
N
e
2

Setting the partial derivatives, ∂L/∂µ and ∂L/∂σ2 to zero and solving for the distribution parameters,
yields the maximum likelihood estimators:
1 
µˆ =
N
∑ xi = m1 


σˆ = ∑ ( xi − x ) = m2 
2 1 2

N 
b) Method of Moments
i) Using the moment generating formula, produce the same number of moments as the distribution has
parameters.
ii) Solve for the desired parameters.
iii) Compute the moments from the data set (X1, X2, …, XN), and use them to calculate estimates for the
distribution parameters.
iv) Example: Find the moment estimators for the exponential distribution:
The mean, µ, and standard deviation, σ, and related to the distribution parameters x0 and β by
µ = x0 + β
σ =β
Therefore is x0 is known and the sample mean x is the estimate of µ, β̂ is gotten from βˆ = x − x0 .
If both x0 and β are unknown, then the equations for both parameters must be used. First βˆ = σˆ , where
σ̂ is the sample estimate of σ, then xˆ0 = x − βˆ = x − σˆ .

11
K. A. Cherkauer - Purdue University

4) Hypothesis Testing
a) Test Basics
i) The Null Hypothesis, H0, is the condition assumed by the tester to be true.
ii) The Alternative Hypothesis, H1, is the condition assumed by the tester to be true if H0 is rejected.
iii) Test probability table:
H0 True H1 True
H0 Accepted Significance (1-α) Type II Error (β)
H0 Rejected Type I Error (α) Power (1-β)

(1) The significance level, α, is set by the tester, and determines the probability that the null hypothesis
will be rejected if it is true.
(2) β, the Type II Error, is the probability of accepting the null hypothesis when it is in fact false.
(3) The power of the test, 1-β, can be determined if a specific alternative hypothesis is set, otherwise it is
unknown.
iv) A test statistic is computed from the data, and compared against the expected statistic computed from α.
Note that the distribution used in the comparison is that of the test statistic, not of the sampled data, or
population.
b) Types of Statistical Tests
Parametric tests assume a distribution, usually a normal distribution, while non-parametric tests make no
assumption about the statistics distribution.
i) Parametric Tests
(1) A Z test assumes a normal or Gaussian distribution, with both the population mean and variance
x−µ
known. The test statistic is in the form Z= . This test is usually used to compare the
σ N
means of two sets of normally distributed independent random data. The Central Limit Theorem
states that if Z1, Z2, … , ZN are random samples from any distribution ƒ(X), then the mean of the
random sample µz will be normally distributed.
(2) A T test is similar to the Z test, except that the population variance is not known; instead the sample
variance is used, and the test assumes a t-distribution. The resulting form of the test statistic is
x−µ
t= . The t-distribution will be wider than the normal distribution, to compensate for the
S N
sample variance, but as N  ∞, the t-distribution  normal distribution. The t-test assumes that
data is normally distributed around their respective means, and that they have the same variance.
(a) A paired t-test is commonly used for evaluating matched pairs of data. The test is conducted on
the differences between the data sets, Di = Xi – Yi, so the differences must be normally
distributed.
(3) The chi-squared test assumes a chi-square distribution, which is a measure of the variance. The test
 X −µ
2

statistic is of the form χ = 


2
 , where X is an independent random variable sampled from
 σ 
a population with mean µ, and standard deviation σ. The distribution has N-1 degrees of freedom,
where N is the number of samples.
(4) The f-test uses an f distribution, which is the ratio of two chi-squared distributions, divided by their
χ12 ( x,ν 1 ) ν 1
degrees of freedom: F ( x,ν 1 ,ν 2 ) = . It is used extensively for comparing the
χ 2 2 ( x,ν 2 ) ν 2
variances between two sets of random variables.
ii) Non-Parametric Tests
(1) The rank-sum test (also called the Wilcoxon, Mann-Whitney, or Wilcoxon-Mann-Whitney rank-sum
test)) is used to test whether one group of data tends to produce larger observations than a second

12
Statistics Notes

independent group. It makes no assumptions about the types of distribution, but data must be
homoscedastic.
• First, the combined data set is ranked by the magnitude of the values: the greatest value receiving
the highest rank, and ties being assigned their average rank.
• Next, ranks of the smaller data set are summed, and compared against the probability distribution
created from all possible outcomes (bounded by sum of lowest rankings and sum of highest
rankings). For large sets of data (N > 10), a Z test can be used on the ranks.
Example: Determine the probability distribution for Xi ∈ { X1, X2, X3 }, and Yi ∈ { Y1, Y2, Y3, Y4 }:
Possible Rank Combinations
1,2,3 1,3,4 1,4,5 1,5,6 1,6,7
1,2,4 1,3,5 1,4,6 1,5,7
1,2,5 1,3,6 1,4,7
1,2,6 1,3,7
1,2,7

Sums of Ranks
6 8 10 12 14
7 9 11 13
8 10 12
9 11
10

Table of Posible Rank Sums and Probabilities


6 1 0.067 11 2 0.133
7 1 0.067 12 2 0.133
8 2 0.133 13 1 0.067
9 2 0.133 14 1 0.067
10 3 0.200

Number of Possible Ranks: 15

Probability Distribution

0.20

0.15
Probability

0.10

0.05

0.00
6 7 8 9 10 11 12 13 14
Rank Sum

(2) The signed-rank sum test (also the Wilcoxon signed-rank test) is used to determine if the median
difference between two paired sets of data (X, Y), is equal to zero. It can also be used to test if the
median of one set of data is significantly different from zero.
• First, the absolute value of the differences between the data sets (Di = Xi – Yi), are ranked, with
highest rank being assigned to the greatest difference, and ties being assigned their average rank.

13
K. A. Cherkauer - Purdue University

• Next, the signs of the differences (±) are applied to their respective ranks, and the positive ranks
are summed.
• The computed statistic is then compared with the probability distribution of all possible
combinations of positive rank (bounded by 0 and the sum of all N ranks). For large data sets, N
> 10, the distribution can be approximated with a normal distribution.
(3) Kendall's tau measures the strength of monotonic relationships between two data sets (X, Y). A
monotonic relationship exists if as the X values increase the dependent values of Y all increase or all
decrease. Kendall’s test does not assume a linear trend, like the correlation coefficient, but it does
require that the data have equal variance.
• First, data is sorted so that the dependent variable, X, is increasing (Xi < Xj, for i < j).
• Next the differences are computed for all combinations of Yj - Yi with i < j (there are N*(N-1)/2
possibilities).
• Positive and negative differences are counted separately and the test statistic, S, is the number of
positive differences minus the number of negative differences.
• The tau statistic, a measure of the data’s correlation, is found by dividing S by N*(N-1)/2, the
total number of step combinations. Thus tau will be +1 if all steps are positive, and –1 if all steps
are negative.
• Trend can be tested for by comparing the S statistic to the distribution of all possible
combinations of pluses and minuses (bounded by 0 and N*(N-1)/2).

14
Statistics Notes

5) Regression Analysis
a) Regression analysis tries to describe the relationship between two or more continuous variables by fitting a
linear model.
b) Ordinary Least Squares
i) Population model: y i = α + βxi + ε i
(1) Where α is the intercept, β is the slope, and εi is the random error between the data pair and the
model prediction.
ii) Estimated parameters: y i = a + bx i + ei
iii) Linear estimation: ~
yi = a + bxi
iv) To find parameters, minimize the sum of squares of the errors:
N N N
dSSE dSSE
SSE = ∑ ei = ∑ ( y i − ~
y i ) 2 = ∑ ( y i − a − bxi ) 2 , by solving = 0 , and = 0 , for
2

i =1 i =1 i =1 da db
a and b.
 dSSE
 da = ∑ (− 2 y i + 2a + 2bxi ) = 0

 dSSE =
 db
(
∑ − 2 xi yi + 2axi + 2bxi 2 = 0 )
∑ (− y i + a + bxi ) = 0

∑ (− y i + a + bxi )xi = 0
∑ y i = Na + b∑ xi

∑ xi y i = a ∑ xi + b ∑ xi
2

Solve the first equation for a:

a=
1
N
[∑ y i − b∑ xi ]
Then substitute into the second equation:

∑x y i i =
1
N
[∑ y i − b ∑ xi ]∑ x i + b∑ xi
2

Solve the second equation for b:


[
1 N N ∑ x i y i − ∑ xi ∑ y i ]
b=
[
1 N N ∑ xi − (∑ xi )
2 2
]
v) Solution:
Sxy
(1) b=
Sxx
(2) a = y − bx
Syy − bSxy
(3) S =
2
, mean square error, or sample variance.
n−2
Sxx
(4) r = b , correlation coefficient.
Syy
N N N
(5) Sxx = ∑ ( xi − x ) 2 , Syy = ∑ ( yi − y ) 2 , and Sxy = ∑ ( xi − x )( yi − y )
i =1 i =1 i =1

15
K. A. Cherkauer - Purdue University

vi) The assumptions used in an OLS regression vary with the purpose for which it is to be used:
Purpose
Test hypotheses,
Predict Y and a Obtain best
Predict Y given estimate
variance for the linear unbiased
X confidence
prediction estimator of Y
intervals
Assumption
Model form is correct: Y is
+ + + +
linearly related to X
Data used to fit the model
are representative of the + + + +
data of interest
Variance of the residuals is
+ + +
constant (homoscedastic)
The residuals are
+ +
independent
The residuals are normally
+
distributed

c) General Least Squares


i) An ordinary least squares (OLS) regression, as described above, assumes that all variation in the
regression is a result of an imperfect model. The general least squares (GLS) regression also assumes that
there is variance between the sampled data and the actual population values.
βˆ OLS = (X T X ) X T Yˆ .
−1
ii) For an OLS model the best unbiased estimator of β is, The covariance matrix

of β OLS is [ ] (
COV βˆ OLS = γ 2 X T X )
−1
= γ 2 I N , where IN is the identity matrix.

iii) For a GLS model the best unbiased estimator of β is, (


βˆ GLS = X T Λ (γ 2 ) X
−1
)
−1
( )
X TΛ γ 2
−1
Yˆ . Λ is
the covariance matrix of β GLS, [ ] [ ( ) ] .
COV βˆ GLS = X T Λ γ 2 X
−1

The covariance of Yˆ is E (Yˆ − Xβ )(Yˆ − Xβ )  = Λ (λ ) = γ ()


T
iv)
2 2
I N + COV Yˆ , where γ2 is the
 
()
model error variance, or residual unexplained variance, and COV Yˆ is the sampling covariance of Y
around the unknown parameter vector X.
d) Multiple Regression
i) A multiple regression model tries to improve a model's explanation of variance by considering the
simultaneous effects of multiple explanatory variables, or functions thereof.
ii) Population model: y i = β 0 + β 1 x1i + β 2 x 2 i + ... + β k x ki + ε i , i = 1..N. This can be rewritten using
matrices as Y = BX + E .
y1 β 1 x11 x 21 ... x k 1 ε1
y2 B β x12 x 22 ... x k 2 ε2
(1) Y = , = 2
, X = , E= .
... ... ... ... ... ... ...
yN β k x1N x2 N ... x kN εN
iii) Minimizing the squares of the errors ETE = (BX + E)T(BX + E), yields a solution for the parameter array
B: B = {XTX}-1 XTY.
iv) Problems arise when explanatory variables are too strongly correlated. This condition is called
multicollinearity, and will result in highly unstable regression coefficients (they will change dramatically
with small changes in X values).

16
Statistics Notes

v) Hypothesis testing can be used to determine whether one set of parameters is better than another, by
testing ~
yi = β 0 + β 1 x1i + ... + β k x ki versus the extended model
~y = β + β x + ... + β x + β x
i 0 1 1i k ki k +1 ( k +1) i + ... + β m x mi , to determine which model explains more
of the sample variance.

6) Trend Analysis
a) Trend analysis fits a linear regression to data, where time is one of the parameters. The slope of the regression
should be tested to see if it significantly different from zero, if so there is evidence to support a trend.
b) More advanced trend tests, like the Seasonal Kendall’s Test, will also check for seasonal trends.

17
K. A. Cherkauer - Purdue University

7) Time Series Analysis


a) Time series analysis involves examining a set of time series data (eg. streamflow records), for the effects of
previous time steps on the current step, and using that information to develop a stochastic model of the time
series. The model then can be used to predict such things as the storage necessary to provide a constant
demand level of water during a drought.
b) Stationary Time Series
i) A time series is stationary if it is free of trends, shits or periodicity. This implies that the statistical
parameters of the series (mean, variance, …) are constant.
ii) Generally annual time series are stationary, but this will change under the influence of global change,
large volcanic eruptions, etc.
iii) Trends are gradual changes in a time series. An example would be a gradual warming of the air
temperature attributed to global warming.
iv) Shifts are sudden changes in a time series. An example would be the sudden decrease in air temperature
attributed to the presence of volcanic aerosols after an eruption.
v) Standardization requires the removing of the mean and variance from a time series. The standardized
x−x
time series, y, is related to the measured time series, x, as y= . Shifts in the mean or variance
Sx
require different values of mean and variance to be calculated from data before and after the shift.
c) Correlation
i) Autocorrelation is a measure of how strongly the value at the current time step is related to the value of a
previous time step.
(1) The lag-k autocorrelation, ρk, can be estimated as the lag-k autocovariance, γk = COV[XtXt+k],
divided by the lag-0 autocovariance, otherwise known as the variance:
N −k
1
γxx N
∑ (x t − x )( xt + k − x )
ρˆ k = rk = t t+k
= t =1
, k ≥ 0.
γxx 1 N
t t

N
∑ (x
t =1
t − x) 2

1
0.9
Short Memory
0.8
0.7 Long Memory

0.6
r(k)

0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5
Lag
Example of autocorrelation plots for long and short memory Markov processes.

ii) Cross-correlation is a measure of how strongly the value at the current time step in one data set, X, is
related to the value of a previous time step in another data set, Y.

18
Statistics Notes

(1) The lag-k cross-correlation, ρk, can be estimated as the lag-k cross-covariance, γk = COV(Xt+kYt),
divided by the lag-0 cross-covariance:
N −k
1
γx N
∑(x t +k − x )( yt − y )
ρˆ k = rk = t + k yt
= t =1
, k ≥ 0.
(γ xxγ yy )
12
1 N N

1
2

∑ ( xt − x ) 2 ∑ ( yt − y ) 2 
t t t t


N  t =1 t =1 
iii) Seasonal correlations can also be computed, where the current season’s dependency on previous seasons
is checked.
iv) Plotting the computed correlation coefficients at various lags generates correlation plots. Correlation
values vary from –1 to 1, while the lag-0 correlation is always 1. The rate at which the correlation values
drop off with increasing lag indicates how dependent they are on the preceding time steps. A lag-1
rk = r1 .
k
Markov process will decrease as a function of its lag-1 autocorrelation:
v) The partial autocorrelation function (PACF), for an autoregressive process is defined as
 1 ρ1 ... ρ k −1  φ k 1   ρ1 
 ρ 1 ... ρ k − 2  φ k 2   ρ 2 
 1 = , where φkk is the PACF. The PACF is used to help detect
 ⋮ ⋮ ⋱ ⋮  ⋮   ⋮ 
    
 ρ k −1 ρ k −2 … 1  φ kk   ρ k 
the underlying model form. For an autoregressive model, φkk = 0 for k > p, the order of the model.
Therefore the order of the needed AR model can be estimated from a plot of the PACF.
d) Storage-Related Statistics
i) When attempting to model hydrologic time series for simulation of reservoir systems some understanding
of the long term flow must be attained. The Hurst coefficient is a measure of long-term persistence of
flow, based on the cumulative departures from mean flow.
• Let {X} be the sequence of flows: X1, X2, …, XN.
• Let k be the size of a subset of flows from {X}: Xi+1, Xi+2, …, Xi+k. k = 1, 2, …, N.
• Define:
k k
k
Dk = ∑ X i − ∑X i , Dmax = max(Dk), Dmin = min(Dk), for all subsets of length k.
i =1 N i =1
• The Hurst Range, RN, for the record length N is RN = Dmax - Dmin.
RN
• The Range is then normalized to yield the rescaled range R N* = ∝ N H . Where SN is the
SN
standard deviation of the flow sequence and H is known as the Hurst Coefficient.
• Hurst showed that for approximately 900 geophysical time series the Hurst coefficient has an average
value of 0.73, and a standard deviation of 0.09. Theoretically, and numerically, the H for a Markov
model is 0.5. Thus natural systems always have a Hurst coefficient higher than numerical models,
this is known as the Hurst phenomena. One interpretation of this is that H = 0.5 for short-memory
models with short term dependence structure, while H > 0.5 for long-memory models with long-term
dependence structure.

e) Time-Series Models
q
i) A moving average model of order q, MA(q), is defined as: yt = µ + ε t − ∑θ j ε t − j , where µ is the
j =1
process mean, the ε’s are uncorrelated, normally distributed random variables, and the θ’s are the model
parameters. The most commonly used form of this model is the MA(1): yt = µ + ε t − θ 1ε t −1 .

19
K. A. Cherkauer - Purdue University

p
ii) An autoregressive or Markov model of order p, AR(p), is defined as: y t = µ + ∑ φ i ( y t −i − µ ) + ε t ,
i =1
where µ is the process mean; the ε’s are uncorrelated, with zero mean, and variance σε2, normally
distributed random variables; and the φ’s are model parameters. The most commonly used form of this
model is the AR(1): yt = µ + φ1 ( yt −1 − µ ) + ε t .
iii) Combining the AR(p) and MA(q) models yields the more versatile autoregressive moving average model,
p q
ARMA(p,q): yt = µ + ∑ φ i ( yt −i − µ ) + ε t − ∑θ j ε t − j , with p autoregressive parameters, φi, and q
i =1 j =1
moving average parameters, θj. The most common form of this model is the ARMA(1,1):
yt = µ + φ1 ( yt −1 − µ ) + ε t − θ 1ε t −1 .
iv) Model Properties:
(1) The backwards operator, B, is defined as: BjZt = Zt-j.

(2) The moving average parameters are generated using θ ( B ) = 1 − ∑θ j B j .
j =1

(3) The autoregressive parameters are generated using φ ( B) = ∑φ j B j , where φ0 = 1. φ is also the
j =0
partial autocorrelation function, φkk, previously defined. A useful property of the ACF is that φkk = 0
for k > p.
(4) θ(B) = φ-1(B).

20
Statistics Notes

8) Probability of Extremes
a) The modeling of extreme events is of major importance to the general population. Levees and bridges need to
be able to withstand some of the most extreme flooding events. The probability of extreme events can be used
to predict the magnitude of floods with 50, 100, … year return periods, from data sets of peaks-over-threshold,
or annual maximums.

9) Analysis of Variance
a) ANOVA (ANalysis Of VAriance)
i) Analysis of variance (ANOVA) tests k independent groups, or treatments, for similarity in their means
(H0: µ0 = µ1 = … = µk).

Treatment
1 2 … i … k
y11 y21 … yi1 … yk1
y12 y22 … yi2 … yk2
… … … …
y1N y2N … yiN … ykN
Total T1. T2. … Ti. … Tk. T..
Mean y1 . y2 . … y. … yk . y..
i

ii) The one-way classification analysis-of-variance model is, yij = µ + αi + εij, for each y in i = 1…k
k
treatments, and j = 1..N measurements per treatment. The model is subject to the constraint ∑α
i =1
i = 0.

∑∑ (y − y..) = n∑ ( y i . − y..) + ∑∑ ( y ij − y i .) , where SST is


k N k k N
2 2 2
iii) Sum-of-Squares Identity: ij
i =1 j =1 i =1 i =1 j =1
the total sum of squares, SSA is the treatment sum of squares, and SSE is the error sum of squares, so the
identity can be rewritten as SST = SSA + SSE.
iv) The ANOVA test statistic is computed as ƒ = s12/s22, where s12 = SSA/(k-1), and s22 = SSE/[k(N-1)]. This
is tested as an F-test with ν1 = k-1, and ν2 = k(N-1).
v) Single-Degree-of-Freedom Comparisons are made using an ANOVA, by using the contrast sum of
squares. A comparison or contrast in the treatment means is any linear function of the form
k k
ω = ∑ ci µ i , where ∑c i = 0 . SSw, the contrast sum of squares is defined as
i =1 i =1
2
 k  k
SSw =  ∑ ci Ti .  n∑ ci , and it indicates the portion of SSA that is explained by the contrast in
2

 i =1  i =1
k
question. To test multiple contrasts, they must be orthogonal, ∑b c
i =1
i i ni = 0 , where bi, and ci, are
coefficients of different contrast functions.
b) Regression Approach to ANOVA
i) The one-way ANOVA in matrix notation is:

21
K. A. Cherkauer - Purdue University

 y11   1 1 0 ⋯ 0   ε 11 
y   1 1 0 ⋯ 0   ε 
 12    12 
 ⋮   ⋮ ⋮ ⋮ ⋮   ⋮ 
     
 y1N   1 1 0 ⋯ 0   ε 1N 
 − −  − − − − − − − − − −  − −
     
 y 21   1 0 1 ⋯ 0   ε 21 
y   1  µ 
0 1 ⋯ 0     ε 22 
 22    α1  
 ⋮   ⋮ ⋮ ⋮ ⋮    ⋮ 
y  =  1  α 2  + ε 
 ⋮   2N 
0 1 ⋯ 0
 2N  
 − −  − − − − − − − − − −     − − 
    α k   
 ⋮   ⋮ ⋮ ⋮ ⋮   ⋮ 
 − −  − − − − − − − − − −  − −
     
 yk1   1 0 0 ⋯ 1   ε k1 
y   1 0 0 ⋯ 1  ε 
 k2     k2 
 ⋮   ⋮ ⋮ ⋮ ⋮   ⋮ 
y   1 0 ⋯ 1   ε 
 kN   0  kN 

ii) Apply the least square approach (Ab = g, where A = XTX, and b = XTY), to the ANOVA model. The
normal equations are given by:
 Nk N N ⋯ N   µˆ   T 
 
N
 N 0 ⋯ 0  αˆ1  T1 
N 0 N ⋯ 0  αˆ 2  = T2 
    
 ⋮ ⋮ ⋮ ⋮  ⋮   ⋮ 

 N 0 0 ⋯ N  α k  Tk 
iii) This matrix is singular (the last k rows of the (k+1)x(k+1) matrix add up to the top row), so the
parameters are not estimable. However, the α’s as used by the ANOVA model are actually deviations of
the treatment means from the overall mean, µ. Therefore testing the equality of population means is
equivalent to testing that the αi’s are all zero.
iv) Using the constraint that all αi’s sum to zero, the matrix form of the model can be rewritten as:
0 N N ⋯ N   µˆ   0 
N
 N 0 ⋯ 0  αˆ 1  T1 
N 0 N ⋯ 0  αˆ 2  = T2 
    
⋮ ⋮ ⋮ ⋮  ⋮   ⋮ 

 N 0 0 ⋯ N  α k  Tk 
T ..
Which can be solved, yielding the estimating equations µˆ = = y.. , and
nk
Ti . T ..
αˆ i = − = y i . − y.. , i = 1, 2, …, k.
n nk
c) Standard Regression

22
Statistics Notes

i) The amount of variance that can be described by a linear regression is determined by the correlation
coefficient, r. The value of r2 * 100, is the percentage of total variance described by the linear regression.
An r2 of 1 or –1 indicates that all variance is described by a linear model.

10) Bibliography

(1975). Flood Studies Report: Volume I, Hydrological Studies. London, Natural Environment Research
Council.

Benjamin, J. R., and C. Allin Cornell (1970). Probability, Statistics and Decidion for Civil Engineers. San
Francisco, McGraw-Hill Book Company.

Bunt, L. N., and Alan Barton (1967). Probability and Hypothesis Testing. Toranto, George G. Harrap and
Co. Ltd.

Draper, N. R., and H. Smith (1966). Applied Regression Analysis. New York, John Wiley and Sons, Inc.

Maidment, D. R., Ed. (1993). Handbook of Hydrology. San Francisco, McGraw-Hill, Inc.

Meyers, B. L., and Norbert L. Enrick (1970). Statistical Functions: A Source of Practical Derivations Based
on Elementary Mathematics. Kent State, Kent State University Press.

Shaw, E. M. (1983). Hydrology in Practice. London, Van Nostrand Reinhold (UK) Co. Ltd.

23

Você também pode gostar