Você está na página 1de 26

CHAPTER 5 PROBABILITY AND STATISTICS

Definition of statistics: The mathematics of the collection, organization and interpretation of numerical data,
especially the analysis of population by inference from sampling
Let

denotes a probability of an event A which is a subset of a sample space.

5.1 Rules of probability


1. Complement rule
2. Addition rule
3. For disjoint events,
4. Product rule,

, thus
If

5.2 Conditional probability


(a) If and are any events with

then

(b) If

then

and

are any events with

and

independent, then

5.3 Multiplication rule


If and are any events then

5.4 Total probability rule


If
are mutually exclusive and exhaustive events, then

P A P A E1 P A E2 P A Ek

P A E1 P E1 P A E2 P E2 P A Ek P Ek

5.5 Bayes Theorem


If
are mutually exclusive events, one of which occurs given that another event

occurs, then

Example 5.1 Three machines produce similar car parts. A produces 40% of the total output, machines B and C
produce 25% and 15% respectively. The proportions of the output from each machine that do not conform to the
specification are 10% for A, 5% for B and 1% for C. What proportion of these parts that do not conform to the
specification are produced by machine A?
Solution
Let D represent the event that a particular part is defective. Then the overall proportion of defective parts is

Using Bayes theorem,

Example 5.2 Suppose that 0.1% of the people in a certain area have a disease D and that a mass screening test is
used to detect cases. The test gives either a positive result or a negative result for each person. In practice the test
gives a positive result with probability 99.9% for a person who has D and a probability of 0.2% for a person who has
not. What is the probability that a person for whom the test is positive actually has the disease?
1

Solution
Let T represent the event that the test gives a positive result.
Then,

Using Bayes theorem,

5.6 Random variables


A random variable (rv) has a sample space of possible numerical values together with a distribution of probabilities.
Examples: (a) the number of defectives in a process (b) number of successful projects.
Random variables can be discrete or continuous.
Discrete random variables and distributions
Definition
If

X is a discrete random variable, then p x P X x is called

a probability mass function or

probability distribution if, for each outcome of x ,


(a)

p x 0

(b)

p x 1
x

Cumulative distribution functions


The cumulative distribution function,

F x for a discrete random

p x P X x is
F x P X x P X t

variable X with probability distribution

tx

Properties of the cumulative distribution functions

F x satisfies the following properties:


(a) F x P X x P X t
tx

0 F x 1
(c) If x y , then F x F y
(b)

Mean of a discrete random variable

X is a discrete random variable with probability distribution p x P X x , then the mean or


expected value for X which is denoted by X or E X is given by
If

X E X xp x
x

Variance of a discrete random variable

X is a discrete random variable with probability distribution p x P X x , then the variance for
X which is denoted by V X or is given by
2
2
V X X2 E X X x X p x

If

Standard deviation of a discrete random variable


The standard deviation of a discrete random variable, denoted as X , is the positive square root for the variance,

X2 .
Example 5.3
The number of successful projects

X per day obtained by a small engineering firm can be described by the

following probability distribution:

P X x 10
0

for x 0,1, 2, 3, 4

otherwise
Find the cumulative distribution function for X . Find the mean and variance for the number of successful projects
per day.
Solution
The cumulative distribution function for X is given by

FX x P X x PX X t
t x

For

For

For

For

x 0 , F 0 P X 0 P 0

0
0
10

x 1 , F 1 P X 1 P 0 P 1
1
0 0 .1
10
x 2 , F 2 P X 2 P 0 P 1 P 2
1 2
0 0 .3
10 10
x 3 , F 3 P X 3 P 0 P 1 P 2 P 3

0
For

1 2 3
0 .6
10 10 10

x 4 , F 4 P X 4 P 0 P 1 P 2 P 3 P 4
1 2 3 4
0 1 .0
10 10 10 10

5.7 Continuous random variables and distributions


Definition
If

X is a continuous random variable defined over a set of real

numbers, then

f x is called a probability

density function, if
(a)

f x 0

(b)

f x dx 1

P a X b f x dx
b

(c)

where

lies in the interval

a, b

Cumulative distribution functions


The cumulative distribution function,
function

f x is

F x P X

F x for a continuous random variable X with probability density

x f t dt for x
x

Properties of the cumulative distribution functions

F x satisfies the following properties:

P X a f t dt
a

(a)

for

for

(b)

P X a f t dt
a

P a X b f t dx
b

(c)

for

Mean of a continuous random variable

X is a continuous random variable with probability density function f x , then the mean or expected value
for X which is denoted by X or E X is given by
If

X E X xf x dx

Variance of a continuous random variable

X is a continuous random variable with probability density function f x , then the variance for
2
denoted by V X or X is given by
2
V X X2 E X X
If

X which is

x X f X x dx
2

x 2 f X x dx X2

Standard deviation of a continuous random variable


The standard deviation of a continuous random variable, denoted as

X2 .

X , is the positive square root for the variance,

Example 5.4 Assume that the particle size of an air pollutant (in micrometers) can be described by the following
probability function:

3
for x 1

f X x x 4
0
otherwise
(a) Show that the f x is a probability density function
(b) Find the cumulative distribution function
(c) Determine the mean and standard deviation
Solution

(a) f x is a probability density function if it satisfies

f x dx 1 .

Here

f X x dx
1

3
dx
x4

x 3

Therefore

f x is a probability density function.

(b) The cumulative distribution function for X is given by

FX x P X x

f t dt for x
X

3
dx
x4
x

1
3
x 1
1
1
3 11 3
x
x

(c) The mean for X is given by

X EX

xf x dx

x
1

3
dx
x4

3
dx
x3

1
3 2
2 x 1
3
micrometer s
2
The variance for

X is given by

V X X2 x 2 f x dx X2

3
3
x 4 dx
x
2
1

3
3
2 dx
x
2
1

3
3

x 1 2
9 3
3 sq. micrometer s
4 4
5.8 Discrete distributions
Bernoulli distribution
PMF
Range
Mean
Variance

P X x p x 1 p
x 0,1 and 0 p 1
p
p 1 p

1 x

Binomial distribution
PMF

Range
Parameters
Mean
Variance

n
n x
P X x p x 1 p
x
x 0,1,, n and 0 p 1
n and p
np
np1 p

Example 5.5 Suppose a road is flooded with probability


during a year and not more than one flood occurs
during a year. What is the probability that it will be flooded at least once during a five year period?
Solution
Let X be the event a flood occurs in a year.
Then,

Poisson distribution
PMF
Range
Parameter
Mean

P X x

x e

x 0,1, 2,

x!

Variance

If
and
, the binomial distribution can be approximated by the Poisson distribution with
.
Example 5.6 The number of flaws for a thin copper wire follows a Poisson distribution with a mean of 2.3 flaws per
mm. (a)Determine the probability of exactly two flaws in 1mm of wire. (b)Determine the probability of ten flaws in
5mm of wire.
Solution
(a) Let X be the number of flaws in 1mm of wire.
Given that
, thus

(a) Let X be the number of flaws in 5mm of wire. Then X has a Poisson distribution with
flaws.

5.9 Continuous distribution


Normal distribution

1x
1
f x
exp

2
2

PDF

Range

x , 0, 0

Parameters

: location parameter, : scale parameter

If X follows a normal distribution then


Also,

.5

.4

f(x)

.3

.2

.1

0.0
-6

-4

-2

5.10 Sample measures and parameter estimates

X 1 , X 2 ,, X n be a random sample from a population with mean and variance 2 . Then the point
estimate for and are
Let

x
8

where

xi
x2 xn
i 1

n
n

is the sample mean

And

s
2

Thus if

where

1 N
xi x 2 is the
s

n 1 i1
2

then

sample variance.

5.11 Confidence interval for the mean based on the normal distribution
(1)Population variance is known
The

100 1 % confidence interval for the mean

X z

X z

is given by

where

X is the sample mean.

th quantile of the standard normal distribution


(b) z is the 100
2
2
(a)

which is given in Table 1.

Assumptions:
(a)

X 1 , X 2 ,, X n is the random sample of size n from a population which has a normal distribution

with mean

and variance 2 .

(b) The sample size

n can either be small or large.

(2)Population variance is unknown


The

100 1 % confidence interval for the mean


S
S
X z
X z
2
2
n
n

is given by

where

X is the sample mean and S is the sample standard deviation.

th quantile of the standard normal distribution


(b) z is the 100
2
2
(a)

which is given in Table 1.

Assumptions:
(a)
mean

X 1 , X 2 ,, X n is the random sample of size n from a population which has a normal distribution with

and variance 2 .

(b) The sample size

n is large.

Table 1: Cumulative distribution function for the standard normal distribution

PZ z

1 2 x2
e dx
2
0

z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0

.00
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413

.01
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438

.02
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461

.03
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485

.04
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7995
0.8264
0.8508

.05
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531

.06
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554

.07
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078
0.8340
0.8577

.08
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7517
0.7823
0.8106
0.8365
0.8599

.09
0.5359
0.5753
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621

1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0

0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772

0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778

0.8686
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783

0.8708
0.8907
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788

0.8729
0.8925
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793

0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.9744
0.9798

0.8770
0.8962
0.9131
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803

0.8790
0.8980
0.9147
0.9292
0.9418
0.9525
0.9616
0.9693
0.9756
0.9808

0.8810
0.8997
0.9162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812

0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817

2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0

0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987

0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987

0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987

0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988

0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988

0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989

0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989

0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9979
0.9985
0.9989

0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990

0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990

3.1
3.2
3.3
3.4

0.9990
0.9993
0.9995
0.9997

0.9991
0.9993
0.9995
0.9997

0.9991
0.9994
0.9995
0.9997

0.9991
0.9994
0.9996
0.9997

0.9992
0.9994
0.9996
0.9997

0.9992
0.9994
0.9996
0.9997

0.9992
0.9994
0.9996
0.9997

0.9992
0.9995
0.9996
0.9997

0.9993
0.9995
0.9996
0.9997

0.9993
0.9995
0.9997
0.9998

Example 5.7
A research was done to determine the wind speed distribution in Penang. The following monthly wind speed data
(measured in m/s) was obtained.
10

15.42 12.85 10.28 13.36 15.42 20.56 16.28


10.28
9.25
8.22 11.31 14.91 16.45 13.36
11.31 11.31 12.85 11.82 14.39 15.42 16.96
12.85 12.85 11.82 14.39 12.34 24.67 12.85
Find a 90% confidence interval for the true mean wind speed in Penang.

25.70
15.42
21.59
20.05

15.42
13.36
15.42
27.24

9.25
12.85
15.42
22.62

Solution
Let
be the true mean wind speed (in m/s) in Penang.

Since the sample size is large

n 40 , the following confidence interval is used.

90% confidence interval for the true population means is given by


S
S
X z
X z
2
2
n
n
S
S
X z 0.05
X z 0.05
n
n
4.489
4.489
14.953 1.65
14.953 1.65
40
40
14.953 1.650.710 14.953 1.650.710
14.953 1.172 14.953 1.172
13.781 16.125

Thus the

Calculations

X 1 X 2 X 40 15.42 10.28 15.42 22.62

14.953
40
40
15.42 14.9532 10.28 14.9532 22.62 14.9532
1 40
2
2
S X i X
39 i 1
39
2
4.489 20.149
From Table 1, z 0.05 1.65
X

Example 5.8
The flow discharge of Sungai Kerian (measured in m3/s) was obtained at random. 50 readings were collected and the
mean flow discharge was found to be 3.512m3/s with a standard deviation of 0.5 m3/s. Construct a 99% confidence
interval for the true mean flow discharge of Sungai Kerian.
Solution
Let
be the true mean flow discharge of Sungai Kerian.

Since the sample size is large

11

n 50 , the following confidence interval is used.

99% confidence interval for the true population means is given by


S
S
X z
X z
2
2
n
n
S
S
X z 0.005
X z 0.005
n
n
0.5
0.5
3.512 2.57
3.512 2.57
50
50
3.512 2.57 0.071 3.512 2.57 0.071
3.512 0.182 3.512 0.182
3.330 3.694

Thus the

Calculations

X 3.512

n 50

S 0.5 . From Table 1, z0.005 2.57

5.12 Confidence intervals for the mean based on the t distribution


The

100 1 % confidence interval for the mean


S
S
X t ,n1
X t ,n1
2
2
n
n

is given by

where

X is the sample mean.


(b) S is the sample standard deviation.

th quantile of the t distribution with n 1 degrees of freedom. The critical


(c) t
is the 100
, n 1
2
2
(a)

values of the t distribution is given in Table 2.


Assumptions:
(a)

X 1 , X 2 ,, X n is the random sample of size n from a population which has a


normal distribution with mean

(b) The sample size

and variance 2 .

n is small.

Example 5.9
The moisture content (measured in percentage) of clay in Batu Ferringhi was investigated. The following data was
obtained from a random sample.
1.81
2.00
2.74
3.56
2.13
4.64
3.64
4.62
4.47
3.12
Construct a 98% confidence interval for the true moisture content for clay by assuming that the sample is from a
normal distribution.
Solution
Let
be the true mean moisture content (in percentage) for clay.

12

Since the sample size is small

n 10 , the following confidence interval is used.

98% confidence interval for the true population means is given by


S
S
X t ,n1
X t ,n1
2
2
n
n
S
S
X t0.01, 9
X t0.01, 9
n
n
1.091
1.091
3.273 2.821
3.273 2.821
10
10
3.273 2.8210.345 3.273 2.8210.345
3.273 0.973 3.273 0.973
2.300 4.246

Thus the

Calculations

X 1 X 2 X 00 1.81 4.64 2.13 3.12

3.273
10
10

1.81 3.273 4.64 3.273 3.12 3.273


1 10
2
S X i X
1.0912 1.190
i

1
9
9
2

From Table 2,

t 0.01, 9 2.821

Table 2: Critical values for the t distribution with degrees of freedom

13

1
2
3
4
5
6
7
8
9
10

0.40
0.325
0.289
0.277
0.271
0.267
0.265
0.263
0.262
0.261
0.260

0.30
0.727
0.617
0.584
0.569
0.559
0.553
0.549
0.546
0.543
0.542

0.20
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879

0.15
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093

0.10
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372

0.05
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.813

0.025
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228

0.02
15.895
4.849
3.482
2.999
2.757
2.612
2.517
2.449
2.398
2.359

0.015
21.205
5.643
3.896
3.298
3.003
2.829
2.715
2.634
2.574
2.528

0.01
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.897
2.821
2.764

11
12
13
14
15
16
17
18
19
20

0.260
0.259
0.259
0.258
0.258
0.258
0.257
0.257
0.257
0.257

0.540
0.539
0.538
0.537
0.536
0.535
0.534
0.534
0.533
0.533

0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860

1.088
1.083
1.080
1.076
1.074
1.071
1.069
1.067
1.066
1.064

1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325

1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725

2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086

2.328
2.303
2.282
2.264
2.249
2.235
2.224
2.214
2.205
2.197

2.491
2.461
2.436
2.415
2.397
2.382
2.368
2.356
2.346
2.336

2.718
2.681
2.650
2.625
2.603
2.584
2.567
2.552
2.540
2.528

21
22
23
24
25
26
27
28
29
30

0.257
0.256
0.256
0.256
0.256
0.256
0.256
0.256
0.256
0.256

0.532
0.532
0.532
0.531
0.531
0.531
0.531
0.530
0.530
0.530

0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854

1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055

1.323
1.321
1.320
1.318
1.316
1.315
1.314
1.313
1.311
1.310

1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697

2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042

2.189
2.183
2.177
2.172
2.167
2.162
2.158
2.154
2.150
2.147

2.328
2.320
2.313
2.307
2.301
2.296
2.291
2.286
2.282
2.278

2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457

40
60
120

0.255
0.254
0.254
0.253

0.529
0.527
0.526
0.524

0.851
0.848
0.845
0.842

1.050
1.046
1.041
1.036

1.303
1.296
1.289
1.282

1.684
1.671
1.658
1.645

2.021
2.000
1.980
1.960

2.123
2.099
2.076
2.054

2.250
2.223
2.196
2.170

2.423
2.390
2.358
2.326

5.13 Tests of hypotheses for the mean based on the normal distribution
(1)Population variance is known
One tail tests

Two tail tests

H 0 : d0

H 0 : d0

H1 : d 0

H1 : d0

H : d
1

Test statistic

14

X d0

2
n

Rejection region
Reject

H0

if

Z z
(or Z z )

Z z

Notes:

(b)

d 0 is a constant.
X is the sample mean.

(c)

(a)

is the

100

th

quantile of the standard normal distribution which is given in Table 1.

Assumptions:
(a)

X 1 , X 2 ,, X n is a random sample of size n from a population which has a normal distribution with

mean

and variance 2 .

(b) The sample size

n can either be small or large.

2 Population variance is unknown


One tail tests

Two tail tests

H0 : d0

H0 : d0
H1 : d0

H1 : d0

H : d
1

Test statistic

X d0
S2
n

Rejection region
Reject

Z z
(or Z z )
Notes:
(a)
15

d 0 is a constant.

H0

if

Z z

X is the sample mean and S is the sample standard deviation.

th quantile of the standard normal distribution


(c) z is the 100
2
2
(b)

which is given in Table 1.

Assumptions:
(a)
mean

X 1 , X 2 ,, X n is a random sample of size n from a population which has a normal distribution with

and variance 2 .

(b) The sample size

n is large.

Example 5.10
A research was done to determine the wind speed distribution in Penang. The following monthly wind speed data
(measured in m/s) was obtained.
15.42 12.85 10.28 13.36 15.42 20.56 16.28 25.70 15.42
9.25
10.28
9.25
8.22 11.31 14.91 16.45 13.36 15.42 13.36 12.85
11.31 11.31 12.85 11.82 14.39 15.42 16.96 21.59 15.42 15.42
12.85 12.85 11.82 14.39 12.34 24.67 12.85 20.05 27.24 22.62
Can you conclude that the mean wind speed in Penang is less than 12m/s? Use

0.10 .

Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let
be the true mean wind speed (in m/s) in Penang.

Since the sample size is large

n 40 , the following hypothesis test is used.

Step 2 : Define the null and alternative hypotheses

H 0 : 12

H 1 : 12
Step 3 : Calculate the test statistic

X d0
S2
n
14.953 12
Z
20.149
40
2.953
Z
0.710
Z 4.159
Z

Calculations

16

X 1 X 2 X 40 15.42 10.28 15.42 22.62

14.953
40
40
15.42 14.9532 10.28 14.9532 22.62 14.9532
1 40
2
2
S X i X
39 i 1
39
4.489 2 20.149
X

Step 4 : Determine the rejection region


Reject

H 0 if Z z z0.10 1.28 (From Table 1).

Step 5 : Result
The null hypothesis cannot be rejected.
Step 6 : Conclusion

0.10 , there is insufficient evidence to show that the true mean wind speed (in m/s) in Penang is less

At
than 12m/s.

Example 5.11
The flow discharge of Sungai Kerian (measured in m3/s) was obtained at random. Fifty readings were collected and
the mean flow discharge was found to be 3.512m3/s with a standard deviation of 0.5 m3/s. Show that the true mean

0.05 .

flow discharge at Sungai Kerian is not equal to 4 m3/s. Use


Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let
be the true mean flow discharge of Sungai Kerian.

Since the sample size is large

n 50 , the following hypothesis test is used.

Step 2 : Define the null and alternative hypotheses

H0 : 4

H1 : 4
Step 3 : Calculate the test statistic

X d0
2
where X 3.512 , S 0.25, n 50
S2
n
3.512 4
Z
0.25
50
Z

17

0.488
0.071
Z 6.873

Step 4 : Determine the rejection region


Reject

H 0 if

Z z z0.025 1.96 or Z z z0.025 1.96 (From Table 1)


2

Step 5 : Result
The null hypothesis is rejected.
Step 6 : Conclusion

0.10

At
, there is sufficient evidence to show that the true mean flow discharge of Sungai Kerian is not
equal to 4 m3/s.

5.14 Test of hypothesis for the mean based on the t distribution


One tail tests
Two tail tests

H0 : d0

H0 : d0

H1 : d0

H1 : d0

H : d
1

Test statistic

X d0
S2
n

Rejection region
Reject

T t ,n1

H0

if

T t

(or T t , n 1 )

,n 1

Notes:

d 0 is a constant.
(b) X is the sample mean.
(c) S is the sample standard deviation.
(a)

(d)

, n 1

is the

100

th

quantile of the t distribution with

of the t distribution is given in Table 2.


Assumptions:

18

n 1 degrees of freedom. The critical values

(a)

X 1 , X 2 ,, X n is a random sample of size n from a population which has a normal distribution with

mean

and variance 2 .

(b) The sample size

n is small.

Example 5.12
The moisture content (measured in percentage) of clay in Batu Ferringhi was investigated. The following data was
obtained from a random sample.
1.81
2.00
2.74
3.56
2.13
4.64
3.64
4.62
4.47
3.12
Is the moisture content greater than 3.0%? Use

0.05 .

Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let
be the true mean moisture content (in percentage) for clay.

Since the sample size is small n 10 , the following hypothesis test is used.
Step 2 : Define the null and alternative hypotheses

H 0 : 3.0
H 1 : 3.0

Step 3 : Calculate the test statistic

X d0

S2
n
3.273 3.0
T
1.190
9
0.273
T
0.364
T 0.750
Calculations

X 1 X 2 X 00 1.81 4.64 2.13 3.12

3.273
10
10

1 10
X i X 2 1.81 3.273 4.64 3.273 3.12 3.273 1.0912 1.190

9 i 1
9
2

S2

Step 4 : Determine the rejection region


Reject H 0 if T t ,n 1 t0.05,9 1.833 (From Table 2).
Step 5 : Result
The null hypothesis cannot be rejected.
19

Step 6 : Conclusion
At 0.10 , there is insufficient evidence to show that the true mean moisture content (in percentage) for clay is
greater than 3%.

5.15 Sample correlation

X and Y .
The sample correlation coefficient of n pairs of observations x1 , y1 , x 2 , y 2 ,, x n , y n denoted by
r is given by
Correlation measures the linear relationship between two variables,

X
n

i 1

X
n

i 1

X Yi Y

Y
n

i 1

i 1

2
i

The strength of the linear relationship is determined by the following:


If

0.80 r 1.00

then the relationship is very strong.

If

0.60 r 0.79

then the relationship is strong.

If

0.40 r 0.59

then the relationship is moderate.

If

0.20 r 0.39

then the relationship is weak.

If

0.00 r 0.19

then the relationship is very weak.

i 1

i 1

i 1

20

X Y

X Y

i 1

Y
i 1

i 1

Example 5.13

The cost,
of a manufacturing product usually depends on the lot size,
. The following data on the cost of the
manufacturing product and its lot size is given below:
30
70
140
270
530
1000
2000
3000
Y
1
5
10
25
50
100
250
500
X
Find the value of the correlation coefficient for the above data.
Solution

Y and X is given by
n
X
i Yi

The correlation coefficient between


n

X Y

i 1

X i2
i 1

21

i 1

i 1

i 1

2135030
941
325751
8

Yi 2
i 1

i 1

9417040

8
7040 2
14379200
8

1306950
1306950

463 .752860 .8 1326696


0.985

Therefore, there is a very strong linear relationship between cost and lot size.
Calculations

n8
8

X 941 , Y 7040 , X Y 2135030 , X 325751 .00 ,


i

i 1
8

i 1

i 1

i 1

Y 14379200
2

i 1

5.16 Simple linear regression


Let

X , Y , X
1

, Y2 ,, X n , Yn be n pairs of random variables. Then the simple linear regression

model is given by

Yi 0 1 X i i

i 1, 2,, n

where

Yi is the dependent or response variable


X i is the independent or regressor or explanatory or predictor
variable

0 is the intercept of the regression model


1 is the slope of the regression model
i is the random error term
Assumptions
The assumptions of the random error term are:

E i 0
2
(b) V i c (a constant)
(a)

(c) The probability distribution is normal


(d) Random error term is independent
Method of least squares
The method of least squares can be used to estimate the values of the intercept (

L min min Y X

This method minimizes the sum of squares of the random error term, that is
n

i 1

22

i 1

0 ) and slope ( 1 ) parameters.

Hence,

n
L
2 Yi 0 1 X i
0
i 1
0
n
L
2 Yi 0 1 X i X i 0
i 1
1

Simplifying yields,

n0 1 X i Yi
n

i 1
n

i 1

0 X i 1 X i2 Yi X i
n

i 1

i 1

i 1

Solving the two equations yield,

Y X
YX
n

0 Y 1 X

and 1

i 1

i 1

where

i 1

and

X
i 1

Thus the fitted or estimated regression model is

Yi 0 1 X i

23

i 1, 2,, n

i 1

Xi

Yi

i 1

i 1

ei Yi Yi is called the residual.


Example 5.14
The yield of a chemical process (in percentage) is hypothesized to be linearly related with the amount of catalyst (in

Y
X
Y

grams). Let

denote the yield of the chemical process and


be the amount of catalyst. The data is given below.
0.9
1.4
1.6
1.7
1.8
2.0
2.1
60.54

63.86

63.76

60.15

66.66

71.66

70.81

Fit a simple linear regression model.


Solution
The following simple linear regression model is fitted

Yi 0 1 X i i

i 1, 2,,7

where

Yi is the yield of a chemical process


X i is the amount of catalyst
By using the least squares method, the estimates for

Y X
YX
n

i 1

X
i 1

And

i 1

i 1

2
i

0 and 1 are

760 .17 751 .5086


19.87 18.8929

i 1

8.6614
8.8644
0.9771

0 Y 1 X 65.3486 8.86441.643 65.3486 14.5642 50.7844

Therefore the fitted simple linear regression model is

Yi 50.784 8.864 X i

for

i 1, 2,,7

Example 5.15
A study was conducted to determine the relationship between bridge pier scour depths,

q . A simple linear regression model of the form D 0 q


D q
D q
D
35.67
31.71
17.84
14.63

24

52.51
52.04
22.58
8.51

12.62
9.76
8.54
13.87

11.99
10.33
8.36
8.24

20.73
11.24
8.80
12.44

D and discharge intensity,

was proposed. The following data was obtained:

25.56
7.39
6.71
13.28

11.48
8.71
4.94
10.07

13.22
11.21
2.61
13.21

12.71 11.15 11.60 6.29


9.20
13.72 13.75 19.51 22.03 9.76
12.88 14.31 11.89 11.15 11.42
19.35 9.20
13.72 18.59 11.22
11.92 8.60
11.89 13.66 10.47
14.98 11.43 12.80 15.99 9.48
Determine the simple linear regression model for this problem.

6.49
6.42
7.78
11.85
9.78
7.48

5.50
7.13
6.85
4.00
4.07
4.08

1.62
7.72
4.68
3.40
4.00
3.18

Solution
The proposed model is given by

D 0q

The above model can be transformed into a simple linear regression model by taking natural logarithm as follows:

ln D ln 0 q
ln D ln 0 ln q
ln D ln 0 1 ln q
1

Letting

Yi ln D , 0 ln 0 and X i ln q , we will obtain the following linear regression model


Yi 0 1 X i i i 1, 2,,40

The following data gives the new values for

Yi ln D and X i ln q

Yi

Xi

Yi

Xi

Yi

Xi

Yi

Xi

3.57
3.46
2.88
2.68
2.54
2.62
2.56
2.96
2.48
2.71

3.96
3.95
3.12
2.14
2.41
2.62
2.66
2.22
2.15
2.44

2.54
2.28
2.14
2.63
2.45
2.97
2.48
2.62
2.48
2.55

2.48
2.34
2.12
2.11
1.84
3.09
2.41
2.92
2.61
2.77

3.03
2.42
2.17
2.52
2.22
2.28
2.44
2.42
2.35
2.25

3.24
2.00
1.90
2.59
1.87
1.86
2.05
2.47
2.28
2.01

2.44
2.16
1.60
2.31
1.70
1.96
1.92
1.39
1.40
1.41

2.58
2.42
.96
2.58
.48
2.04
1.54
1.22
1.39
1.16

By using the least squares method, the estimates for

Y X
YX
n

i 1

X i2
i 1

i 1

25

i 1

0 and 1 are

i 1

230 .09 218 .4492


226 .25 207 .1615


And

11.6408
0.6098
19.0885

0 Y 1 X 2.3997 0.60982.2757 2.3997 1.3877 1.012

0 ln 0

1.012
So 0 e e
2.7511
Here

Therefore the fitted model is

D 0 q 2.7511q 0.6098 for i 1, 2,,40


1

Calculations
40

i 1

40
40
Xi

3.57 3.46 1.40 1.41 95.99

2.3997
40
40

3.96 3.95 1.39 1.16 91.03

2.2757
40
40
40
40
Yi X i 3.57 3.96 3.46 3.95 1.40 1.39 1.41 1.16 230 .09
X

i 1

i 1

Y X 95.99 91.03 8737 .9697 218.4492


40

i 1

40

i 1

n
40
40
X i2 3.96 2 3.95 2 3.12 2 1.22 2 1.39 2 1.16 2
25

i 1

15.69 15.62 9.72 1.50 1.92 1.34 226 .25

X 91.03
2

40

i 1

26

40

8286 .4609
207 .1615
40

Você também pode gostar