Escolar Documentos
Profissional Documentos
Cultura Documentos
Definition of statistics: The mathematics of the collection, organization and interpretation of numerical data,
especially the analysis of population by inference from sampling
Let
, thus
If
then
(b) If
then
and
and
independent, then
P A P A E1 P A E2 P A Ek
P A E1 P E1 P A E2 P E2 P A Ek P Ek
occurs, then
Example 5.1 Three machines produce similar car parts. A produces 40% of the total output, machines B and C
produce 25% and 15% respectively. The proportions of the output from each machine that do not conform to the
specification are 10% for A, 5% for B and 1% for C. What proportion of these parts that do not conform to the
specification are produced by machine A?
Solution
Let D represent the event that a particular part is defective. Then the overall proportion of defective parts is
Example 5.2 Suppose that 0.1% of the people in a certain area have a disease D and that a mass screening test is
used to detect cases. The test gives either a positive result or a negative result for each person. In practice the test
gives a positive result with probability 99.9% for a person who has D and a probability of 0.2% for a person who has
not. What is the probability that a person for whom the test is positive actually has the disease?
1
Solution
Let T represent the event that the test gives a positive result.
Then,
p x 0
(b)
p x 1
x
p x P X x is
F x P X x P X t
tx
0 F x 1
(c) If x y , then F x F y
(b)
X E X xp x
x
X is a discrete random variable with probability distribution p x P X x , then the variance for
X which is denoted by V X or is given by
2
2
V X X2 E X X x X p x
If
X2 .
Example 5.3
The number of successful projects
P X x 10
0
for x 0,1, 2, 3, 4
otherwise
Find the cumulative distribution function for X . Find the mean and variance for the number of successful projects
per day.
Solution
The cumulative distribution function for X is given by
FX x P X x PX X t
t x
For
For
For
For
x 0 , F 0 P X 0 P 0
0
0
10
x 1 , F 1 P X 1 P 0 P 1
1
0 0 .1
10
x 2 , F 2 P X 2 P 0 P 1 P 2
1 2
0 0 .3
10 10
x 3 , F 3 P X 3 P 0 P 1 P 2 P 3
0
For
1 2 3
0 .6
10 10 10
x 4 , F 4 P X 4 P 0 P 1 P 2 P 3 P 4
1 2 3 4
0 1 .0
10 10 10 10
numbers, then
f x is called a probability
density function, if
(a)
f x 0
(b)
f x dx 1
P a X b f x dx
b
(c)
where
a, b
f x is
F x P X
x f t dt for x
x
P X a f t dt
a
(a)
for
for
(b)
P X a f t dt
a
P a X b f t dx
b
(c)
for
X is a continuous random variable with probability density function f x , then the mean or expected value
for X which is denoted by X or E X is given by
If
X E X xf x dx
X is a continuous random variable with probability density function f x , then the variance for
2
denoted by V X or X is given by
2
V X X2 E X X
If
X which is
x X f X x dx
2
x 2 f X x dx X2
X2 .
Example 5.4 Assume that the particle size of an air pollutant (in micrometers) can be described by the following
probability function:
3
for x 1
f X x x 4
0
otherwise
(a) Show that the f x is a probability density function
(b) Find the cumulative distribution function
(c) Determine the mean and standard deviation
Solution
f x dx 1 .
Here
f X x dx
1
3
dx
x4
x 3
Therefore
FX x P X x
f t dt for x
X
3
dx
x4
x
1
3
x 1
1
1
3 11 3
x
x
X EX
xf x dx
x
1
3
dx
x4
3
dx
x3
1
3 2
2 x 1
3
micrometer s
2
The variance for
X is given by
V X X2 x 2 f x dx X2
3
3
x 4 dx
x
2
1
3
3
2 dx
x
2
1
3
3
x 1 2
9 3
3 sq. micrometer s
4 4
5.8 Discrete distributions
Bernoulli distribution
PMF
Range
Mean
Variance
P X x p x 1 p
x 0,1 and 0 p 1
p
p 1 p
1 x
Binomial distribution
PMF
Range
Parameters
Mean
Variance
n
n x
P X x p x 1 p
x
x 0,1,, n and 0 p 1
n and p
np
np1 p
Poisson distribution
PMF
Range
Parameter
Mean
P X x
x e
x 0,1, 2,
x!
Variance
If
and
, the binomial distribution can be approximated by the Poisson distribution with
.
Example 5.6 The number of flaws for a thin copper wire follows a Poisson distribution with a mean of 2.3 flaws per
mm. (a)Determine the probability of exactly two flaws in 1mm of wire. (b)Determine the probability of ten flaws in
5mm of wire.
Solution
(a) Let X be the number of flaws in 1mm of wire.
Given that
, thus
(a) Let X be the number of flaws in 5mm of wire. Then X has a Poisson distribution with
flaws.
1x
1
f x
exp
2
2
Range
x , 0, 0
Parameters
.5
.4
f(x)
.3
.2
.1
0.0
-6
-4
-2
X 1 , X 2 ,, X n be a random sample from a population with mean and variance 2 . Then the point
estimate for and are
Let
x
8
where
xi
x2 xn
i 1
n
n
And
s
2
Thus if
where
1 N
xi x 2 is the
s
n 1 i1
2
then
sample variance.
5.11 Confidence interval for the mean based on the normal distribution
(1)Population variance is known
The
X z
X z
is given by
where
Assumptions:
(a)
X 1 , X 2 ,, X n is the random sample of size n from a population which has a normal distribution
with mean
and variance 2 .
is given by
where
Assumptions:
(a)
mean
X 1 , X 2 ,, X n is the random sample of size n from a population which has a normal distribution with
and variance 2 .
n is large.
PZ z
1 2 x2
e dx
2
0
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
.00
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413
.01
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
.02
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
.03
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
.04
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7995
0.8264
0.8508
.05
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
.06
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
.07
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078
0.8340
0.8577
.08
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7517
0.7823
0.8106
0.8365
0.8599
.09
0.5359
0.5753
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.8686
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.8708
0.8907
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788
0.8729
0.8925
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.9744
0.9798
0.8770
0.8962
0.9131
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.8790
0.8980
0.9147
0.9292
0.9418
0.9525
0.9616
0.9693
0.9756
0.9808
0.8810
0.8997
0.9162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987
0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989
0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989
0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9979
0.9985
0.9989
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990
3.1
3.2
3.3
3.4
0.9990
0.9993
0.9995
0.9997
0.9991
0.9993
0.9995
0.9997
0.9991
0.9994
0.9995
0.9997
0.9991
0.9994
0.9996
0.9997
0.9992
0.9994
0.9996
0.9997
0.9992
0.9994
0.9996
0.9997
0.9992
0.9994
0.9996
0.9997
0.9992
0.9995
0.9996
0.9997
0.9993
0.9995
0.9996
0.9997
0.9993
0.9995
0.9997
0.9998
Example 5.7
A research was done to determine the wind speed distribution in Penang. The following monthly wind speed data
(measured in m/s) was obtained.
10
25.70
15.42
21.59
20.05
15.42
13.36
15.42
27.24
9.25
12.85
15.42
22.62
Solution
Let
be the true mean wind speed (in m/s) in Penang.
Thus the
Calculations
14.953
40
40
15.42 14.9532 10.28 14.9532 22.62 14.9532
1 40
2
2
S X i X
39 i 1
39
2
4.489 20.149
From Table 1, z 0.05 1.65
X
Example 5.8
The flow discharge of Sungai Kerian (measured in m3/s) was obtained at random. 50 readings were collected and the
mean flow discharge was found to be 3.512m3/s with a standard deviation of 0.5 m3/s. Construct a 99% confidence
interval for the true mean flow discharge of Sungai Kerian.
Solution
Let
be the true mean flow discharge of Sungai Kerian.
11
Thus the
Calculations
X 3.512
n 50
is given by
where
and variance 2 .
n is small.
Example 5.9
The moisture content (measured in percentage) of clay in Batu Ferringhi was investigated. The following data was
obtained from a random sample.
1.81
2.00
2.74
3.56
2.13
4.64
3.64
4.62
4.47
3.12
Construct a 98% confidence interval for the true moisture content for clay by assuming that the sample is from a
normal distribution.
Solution
Let
be the true mean moisture content (in percentage) for clay.
12
Thus the
Calculations
3.273
10
10
1
9
9
2
From Table 2,
t 0.01, 9 2.821
13
1
2
3
4
5
6
7
8
9
10
0.40
0.325
0.289
0.277
0.271
0.267
0.265
0.263
0.262
0.261
0.260
0.30
0.727
0.617
0.584
0.569
0.559
0.553
0.549
0.546
0.543
0.542
0.20
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.15
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
0.10
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
0.05
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.813
0.025
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
0.02
15.895
4.849
3.482
2.999
2.757
2.612
2.517
2.449
2.398
2.359
0.015
21.205
5.643
3.896
3.298
3.003
2.829
2.715
2.634
2.574
2.528
0.01
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.897
2.821
2.764
11
12
13
14
15
16
17
18
19
20
0.260
0.259
0.259
0.258
0.258
0.258
0.257
0.257
0.257
0.257
0.540
0.539
0.538
0.537
0.536
0.535
0.534
0.534
0.533
0.533
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
1.088
1.083
1.080
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.328
2.303
2.282
2.264
2.249
2.235
2.224
2.214
2.205
2.197
2.491
2.461
2.436
2.415
2.397
2.382
2.368
2.356
2.346
2.336
2.718
2.681
2.650
2.625
2.603
2.584
2.567
2.552
2.540
2.528
21
22
23
24
25
26
27
28
29
30
0.257
0.256
0.256
0.256
0.256
0.256
0.256
0.256
0.256
0.256
0.532
0.532
0.532
0.531
0.531
0.531
0.531
0.530
0.530
0.530
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.323
1.321
1.320
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.189
2.183
2.177
2.172
2.167
2.162
2.158
2.154
2.150
2.147
2.328
2.320
2.313
2.307
2.301
2.296
2.291
2.286
2.282
2.278
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
40
60
120
0.255
0.254
0.254
0.253
0.529
0.527
0.526
0.524
0.851
0.848
0.845
0.842
1.050
1.046
1.041
1.036
1.303
1.296
1.289
1.282
1.684
1.671
1.658
1.645
2.021
2.000
1.980
1.960
2.123
2.099
2.076
2.054
2.250
2.223
2.196
2.170
2.423
2.390
2.358
2.326
5.13 Tests of hypotheses for the mean based on the normal distribution
(1)Population variance is known
One tail tests
H 0 : d0
H 0 : d0
H1 : d 0
H1 : d0
H : d
1
Test statistic
14
X d0
2
n
Rejection region
Reject
H0
if
Z z
(or Z z )
Z z
Notes:
(b)
d 0 is a constant.
X is the sample mean.
(c)
(a)
is the
100
th
Assumptions:
(a)
X 1 , X 2 ,, X n is a random sample of size n from a population which has a normal distribution with
mean
and variance 2 .
H0 : d0
H0 : d0
H1 : d0
H1 : d0
H : d
1
Test statistic
X d0
S2
n
Rejection region
Reject
Z z
(or Z z )
Notes:
(a)
15
d 0 is a constant.
H0
if
Z z
Assumptions:
(a)
mean
X 1 , X 2 ,, X n is a random sample of size n from a population which has a normal distribution with
and variance 2 .
n is large.
Example 5.10
A research was done to determine the wind speed distribution in Penang. The following monthly wind speed data
(measured in m/s) was obtained.
15.42 12.85 10.28 13.36 15.42 20.56 16.28 25.70 15.42
9.25
10.28
9.25
8.22 11.31 14.91 16.45 13.36 15.42 13.36 12.85
11.31 11.31 12.85 11.82 14.39 15.42 16.96 21.59 15.42 15.42
12.85 12.85 11.82 14.39 12.34 24.67 12.85 20.05 27.24 22.62
Can you conclude that the mean wind speed in Penang is less than 12m/s? Use
0.10 .
Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let
be the true mean wind speed (in m/s) in Penang.
H 0 : 12
H 1 : 12
Step 3 : Calculate the test statistic
X d0
S2
n
14.953 12
Z
20.149
40
2.953
Z
0.710
Z 4.159
Z
Calculations
16
14.953
40
40
15.42 14.9532 10.28 14.9532 22.62 14.9532
1 40
2
2
S X i X
39 i 1
39
4.489 2 20.149
X
Step 5 : Result
The null hypothesis cannot be rejected.
Step 6 : Conclusion
0.10 , there is insufficient evidence to show that the true mean wind speed (in m/s) in Penang is less
At
than 12m/s.
Example 5.11
The flow discharge of Sungai Kerian (measured in m3/s) was obtained at random. Fifty readings were collected and
the mean flow discharge was found to be 3.512m3/s with a standard deviation of 0.5 m3/s. Show that the true mean
0.05 .
H0 : 4
H1 : 4
Step 3 : Calculate the test statistic
X d0
2
where X 3.512 , S 0.25, n 50
S2
n
3.512 4
Z
0.25
50
Z
17
0.488
0.071
Z 6.873
H 0 if
Step 5 : Result
The null hypothesis is rejected.
Step 6 : Conclusion
0.10
At
, there is sufficient evidence to show that the true mean flow discharge of Sungai Kerian is not
equal to 4 m3/s.
H0 : d0
H0 : d0
H1 : d0
H1 : d0
H : d
1
Test statistic
X d0
S2
n
Rejection region
Reject
T t ,n1
H0
if
T t
(or T t , n 1 )
,n 1
Notes:
d 0 is a constant.
(b) X is the sample mean.
(c) S is the sample standard deviation.
(a)
(d)
, n 1
is the
100
th
18
(a)
X 1 , X 2 ,, X n is a random sample of size n from a population which has a normal distribution with
mean
and variance 2 .
n is small.
Example 5.12
The moisture content (measured in percentage) of clay in Batu Ferringhi was investigated. The following data was
obtained from a random sample.
1.81
2.00
2.74
3.56
2.13
4.64
3.64
4.62
4.47
3.12
Is the moisture content greater than 3.0%? Use
0.05 .
Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let
be the true mean moisture content (in percentage) for clay.
Since the sample size is small n 10 , the following hypothesis test is used.
Step 2 : Define the null and alternative hypotheses
H 0 : 3.0
H 1 : 3.0
X d0
S2
n
3.273 3.0
T
1.190
9
0.273
T
0.364
T 0.750
Calculations
3.273
10
10
1 10
X i X 2 1.81 3.273 4.64 3.273 3.12 3.273 1.0912 1.190
9 i 1
9
2
S2
Step 6 : Conclusion
At 0.10 , there is insufficient evidence to show that the true mean moisture content (in percentage) for clay is
greater than 3%.
X and Y .
The sample correlation coefficient of n pairs of observations x1 , y1 , x 2 , y 2 ,, x n , y n denoted by
r is given by
Correlation measures the linear relationship between two variables,
X
n
i 1
X
n
i 1
X Yi Y
Y
n
i 1
i 1
2
i
0.80 r 1.00
If
0.60 r 0.79
If
0.40 r 0.59
If
0.20 r 0.39
If
0.00 r 0.19
i 1
i 1
i 1
20
X Y
X Y
i 1
Y
i 1
i 1
Example 5.13
The cost,
of a manufacturing product usually depends on the lot size,
. The following data on the cost of the
manufacturing product and its lot size is given below:
30
70
140
270
530
1000
2000
3000
Y
1
5
10
25
50
100
250
500
X
Find the value of the correlation coefficient for the above data.
Solution
Y and X is given by
n
X
i Yi
X Y
i 1
X i2
i 1
21
i 1
i 1
i 1
2135030
941
325751
8
Yi 2
i 1
i 1
9417040
8
7040 2
14379200
8
1306950
1306950
Therefore, there is a very strong linear relationship between cost and lot size.
Calculations
n8
8
i 1
8
i 1
i 1
i 1
Y 14379200
2
i 1
X , Y , X
1
model is given by
Yi 0 1 X i i
i 1, 2,, n
where
E i 0
2
(b) V i c (a constant)
(a)
L min min Y X
This method minimizes the sum of squares of the random error term, that is
n
i 1
22
i 1
Hence,
n
L
2 Yi 0 1 X i
0
i 1
0
n
L
2 Yi 0 1 X i X i 0
i 1
1
Simplifying yields,
n0 1 X i Yi
n
i 1
n
i 1
0 X i 1 X i2 Yi X i
n
i 1
i 1
i 1
Y X
YX
n
0 Y 1 X
and 1
i 1
i 1
where
i 1
and
X
i 1
Yi 0 1 X i
23
i 1, 2,, n
i 1
Xi
Yi
i 1
i 1
Y
X
Y
grams). Let
63.86
63.76
60.15
66.66
71.66
70.81
Yi 0 1 X i i
i 1, 2,,7
where
Y X
YX
n
i 1
X
i 1
And
i 1
i 1
2
i
0 and 1 are
i 1
8.6614
8.8644
0.9771
Yi 50.784 8.864 X i
for
i 1, 2,,7
Example 5.15
A study was conducted to determine the relationship between bridge pier scour depths,
24
52.51
52.04
22.58
8.51
12.62
9.76
8.54
13.87
11.99
10.33
8.36
8.24
20.73
11.24
8.80
12.44
25.56
7.39
6.71
13.28
11.48
8.71
4.94
10.07
13.22
11.21
2.61
13.21
6.49
6.42
7.78
11.85
9.78
7.48
5.50
7.13
6.85
4.00
4.07
4.08
1.62
7.72
4.68
3.40
4.00
3.18
Solution
The proposed model is given by
D 0q
The above model can be transformed into a simple linear regression model by taking natural logarithm as follows:
ln D ln 0 q
ln D ln 0 ln q
ln D ln 0 1 ln q
1
Letting
Yi ln D and X i ln q
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
3.57
3.46
2.88
2.68
2.54
2.62
2.56
2.96
2.48
2.71
3.96
3.95
3.12
2.14
2.41
2.62
2.66
2.22
2.15
2.44
2.54
2.28
2.14
2.63
2.45
2.97
2.48
2.62
2.48
2.55
2.48
2.34
2.12
2.11
1.84
3.09
2.41
2.92
2.61
2.77
3.03
2.42
2.17
2.52
2.22
2.28
2.44
2.42
2.35
2.25
3.24
2.00
1.90
2.59
1.87
1.86
2.05
2.47
2.28
2.01
2.44
2.16
1.60
2.31
1.70
1.96
1.92
1.39
1.40
1.41
2.58
2.42
.96
2.58
.48
2.04
1.54
1.22
1.39
1.16
Y X
YX
n
i 1
X i2
i 1
i 1
25
i 1
0 and 1 are
i 1
And
11.6408
0.6098
19.0885
0 ln 0
1.012
So 0 e e
2.7511
Here
Calculations
40
i 1
40
40
Xi
2.3997
40
40
2.2757
40
40
40
40
Yi X i 3.57 3.96 3.46 3.95 1.40 1.39 1.41 1.16 230 .09
X
i 1
i 1
i 1
40
i 1
n
40
40
X i2 3.96 2 3.95 2 3.12 2 1.22 2 1.39 2 1.16 2
25
i 1
X 91.03
2
40
i 1
26
40
8286 .4609
207 .1615
40