Escolar Documentos
Profissional Documentos
Cultura Documentos
Statistics
Text references: Lecture Notes Series: Engineering Mathematics Volume 1, 2nd Edition, Pearson
2006.
5.1
Sampling Theory
A sample is good when it is able to represent a population well. Thus when we have a
good sample, instead of having to look at the population to get some ideas of its
characteristics, we can turn to the sample for inference regarding its population. Since
we are going to be dealing with samples, we shall dwell into some theories.
5.1.1 Random variable
It was quoted before that a random experiment is an experiment whereby its
outcomes cannot be determined with certainty. A random variable is basically a
measurement from a random experiment. Its formal definition follows:
Let S be a sample space. A real-valued function X defined on S is called a random
variable, i.e., X : S R.
Random variables are usually denoted by capital letters, i.e., X, Y, Z, and the
lower case x, y, z, as their values.
The set containing all possible values for X is called the range for X and denoted
as R X .
5.1.3
Statistic
Instead of looking at each element in a sample, we derive certain statistics from a
sample that will give us an idea of the characteristics of that sample. For example,
mean is a statistic that tells us of the location of the sample while the variance tells us
the spread of a sample.
A more formal definition is given below:
The random variable Y = H(X1, X2, X3, , Xn) is called a statistic where H is a real
function, and X1, X2, X3, , Xn denote a random sample of size n from a given
distribution.
___________________________________________________________________________________
1
Statistics
(b)
n X i2 X i
i 1
or S 2 i 1
n(n 1)
Theorem 5.1.1
Let X 1 , X 2 , X 3 , , X n be a random sample of size n from the normal distribution,
N ( , 2 ) .Then the sample mean, X , is also normally distributed with E X n
and var ( X n )
2
1 n
where X n X i .
n
n i 1
From above, note that as n increases, the variance of the sample decreases, which
means that the sample mean approaches the population mean. This fact is highlighted
by the important theorem below:
___________________________________________________________________________________
2
Statistics
Normal approximation for X is good if n 30 for any population. But for n 30 , the
normal approximation for X is only good for a population that is more or less the
same as a normal population.
Example 5.1.1
A factory produces the bulb with lifetime approximation to normal distribution with
mean 600 hours and standard deviation 18 hours. Find the probability for average
lifetime less than 585 hours (if sample size n = 9).
Solution:
Let X be the average lifetime of bulb, then
18
X 600,
X
6.
3
585 600
P( X 585) P Z
P( Z 2.5)
= 1 PZ 2.5
= 0.00621
When we have a sample of size n 30 that comes from a normal population and with
2 unknown, it is always better to approximate using a t-distribution, hence the
definition below:
Definition 5.1.1 ( t distribution)
If X and S 2 are the mean and variance for a random sample of size n chosen from a
X
normal population with mean, and variance 2 , then T
has a t distribution
X
S
with n 1 degrees of freedom, where X
.
n
Remarks: t Distribution is approximated to standard normal distribution when
sample size n . Also note that since we do not know the population variance, 2 ,
we approximate using the sample variance, s 2 .
Example 5.1.2
Suppose a manufacturer is interested in the average production of a machine in a day,
more specifically, he is interested in the probability of the machine producing on
average more than 100 items per day. It is known that the machine has a normal
distribution with mean and variance 2 . The manufacturer measured the
production of 11 machines yielding the following data:
115
82
98
126
109
143
136
92
103
127
150
___________________________________________________________________________________
3
Statistics
5.2
Statistical Inference
5.2.1.1 Point Estimator: A value of sample statistic that produces a single numerical
value as the estimate of unknown parameter.
Remark: Statistic X (computed from a sample) is a point estimate of the population
parameter and the sample variance s 2 is the point estimator of 2 .
In estimating a parameter, we have to be sure that the statistic that we plan to use is an
unbiased estimator of the parameter. The definition that follows tells us how to
determine this.
Definition 5.2.1.1
A statistic is said to be an unbiased estimator of the parameter if E .
Otherwise, it is said to be biased.
Example 5.2.1.1
n
(X X )2
S2 i
is an unbiased estimator of parameter 2 .
n
1
i 1
Definition 5.2.1.2
By considering all possible unbiased estimator of some parameter , the one with
smallest variance is called the most efficient estimator of .
If we say that a distance is measured as 5.28 mm, we are giving a point estimate. If
on the other hand we say that the distance is 5.28 0.02 mm, i.e. the distance lies
between 5.25 and 5.31 mm, we are giving an interval estimate.
5.2.1.2
Interval estimator: A random interval in which the true value of the
parameter falls with some level of probability.
___________________________________________________________________________________
4
Statistics
5.2.1.2.1
Estimating the Mean (Confidence Interval for )
(a)
Normal population and known
If x is the mean of a random sample of size n from a population with known
variance 2 , a 1 100% confidence interval for is given by
x z
x z
n
n
2
2
Example 5.2.1.2
Measurements of the weights of a random sample of 200 containers made by a certain
machine showed a mean of 0.21 kilograms and it is known that = 0.002 kilograms.
Find the 95% confidence interval for the mean weight of all the containers.
Solution:
n = 200, 1 100 95 0.05 , 0.002 , z 0.025 1.96 .
The 95% confidence interval is
0.002
x z
x 1.96
0.21 1.96
0.21 0.0002772
n
n
200
2
Or 0.2097 0.2103 .
Remark: What does this interval signify?
___________________________________________________________________________________
5
Statistics
(b)
variance.
Example 5.2.1.3
The average calcium contain in 36 samples taken from different locations is found to
be 1.6 grams per millilitre. Find the 95% confidence intervals for the mean calcium
contain in the river. Assume that the sample standard deviation is 0.2.
Solution:
The point estimate for is x 1.6 .
The z-value, leaving an area of 0.025 to the right and therefore an area of 0.975 to the
left, is z 0.025 1.96 .
Hence the 95% confidence interval is
0 .2
0.2
1.6 1.96
1.6 1.96
which reduces to 1.5347 1.6653 .
36
36
(c)
___________________________________________________________________________________
6
Statistics
Example 5.2.1.4
A random sample of 12 clerks of a certain company typed an average of 75.6 words
per minute with a standard deviation of 8.2 words per minute. Find a 95% confidence
interval for the average number of words typed by all clerks of this company.
(Assuming a normal distribution for the number of words typed per minute)
Solution:
n = 12, x 75.6 , s = 8.2, t0.025 2.201 with 11 degrees of freedom. Hence, the
confidence interval for is
8 .2
8.2
75.6 2.201
75.6 2.201
12
12
70.38993 80.81007 .
Example 5.2.1.5
A machine is producing containers that are cylindrical in shape. Nine containers are
randomly chosen and the diameters are 10.01, 9.97, 10.03, 10.04, 9.99, 9.98, 9.99,
10.01, and 10.03 centimetres. Find a 99% confidence interval for the mean diameter
of containers from this machine, assuming an approximate normal distribution.
Solution:
n = 9.
10.01 9.97 10.03 10.04 9.99 9.98 9.99 10.01 10.03
x
10.0056 .
9
n
x
s
0.0246 .
n 1
i 1
t 0.005 3.355 with 8 degrees of freedom. Hence, the confidence interval is
i
0.0246
0.0246
10.0056 (3.355)
< < 10.0056 + (3.355)
3
3
9.9781 < < 10.0331.
5.2.1.2.2
proportion)
to right.
2
___________________________________________________________________________________
7
Statistics
Example 5.2.1.6
In a random sample of n = 600 families owning television sets in a city, it is found
that x = 240 subscribed to ASTRO. Find a 95% confidence interval for the actual
proportion of families in this city who subscribe to ASTRO.
Solution:
240
0.4 . Using the statistical table, we found that
600
1.96 . Therefore, the 95% confidence interval for p is
0.4 1.96
600
600
which can be simplified to 0.3608< p < 0.4392.
Remark: Take note that the confidence intervals built so far are the two-sided
confidence intervals (for mean and proportion). What if you are interested in only the
lower (or upper) bound of the mean ?
x z
n
More specifically, the one-sided confidence interval given above is bounded
will form the upper bound for . How will the onen
sided confidence interval corresponding to this upper bound look like?
Remark: Similarly, x z
___________________________________________________________________________________
8
Statistics
5.2.1.2.3
n 1s 2
2
( n 1 )s 2
2
P 2 2 2 1
1 2
2
2
1
2
2
Example 5.2.1.7
The following are the weights, in grams, of 10 packages of sugars packed by a
worker:
454, 451, 458, 450, 451, 459, 458, 459, 452 and 450.
Find a 95% confidence interval for the variance of all such packages of sugars packed
by this worker, assuming a normal population.
Solution:
First we find
2
n
n
n xi2 xi
2
i 1 10 2063112 4542 15.0667 .
s 2 i 1
nn 1
109
To obtain a 95% confidence interval, we have 0.05 .
Then, using the statistical table with 9 degrees of freedom, we found that
02.025 19.023 , 20.975 2.700 . Therefore, the 95% confidence interval for 2 is
915.0667 2 915.0667 or simply 7.1282 < 2 < 50.2223.
19.023
2.700
___________________________________________________________________________________
9
Statistics
Example 5.2.2.1
Suppose a manufacturer observes that the existing procedures gives about 4% of
defective product. The engineer would like to implement a new procedure to reduce
the number of defective product. It was agreed that n 100 products would be
produced using the new procedure. Let X equal the number of these 100 products that
are defective. Thus we have the following tests:
H 0 : p 0.04
H 1 : p 0.04
We would like to reject H 0 and accept H 1 , so that the number of defective product is
reduced. Since a sample of 100 is taken, it is reasonable to accept H 1 , if X 4 . If
X 4 , then we accept H 0 .
Statistics
through the application of probability theory, and is rejected otherwise. The set of
values for the test statistic that result in the acceptance of the null hypothesis is called
the acceptance region; the set of values that support the rejection of the null
hypothesis is called the critical region (rejection region).
When dealing with hypothesis testing, there is always a chance for an error to occur,
namely the error that we accept the new procedure as improvement when in fact, it
was not, or the error of rejecting the new procedure as improvement, when in fact, it
was.
(d)
Example 5.2.2.2
Referring to Example 5.2.2.1, we would accept the new procedure as being an
improvement when, in fact, it was not. This decision is a mistake which we call a
type I error.
b x; 100; 0.04
x 0
Statistics
1 b x; 100; 0.02
x0
Example 5.2.2.4
Referring to Example 1, the function L p can be calculated as follows:
L p Pacceptance of H 0
P X 4
3
1 b x; 100, p
x 0
In tests of hypotheses, we have One- and Two-Tailed Tests which are stated as
follows:
Hypothesis tests
Two-sided
One-sided(Left side) One-sided(right
side)
Symbol in H0
=
= or
= or
Symbol in H1
<
>
Rejection region
In both tails
In the left tail
In the right tail
___________________________________________________________________________________
12
Statistics
Example 5.2.2.5
A manufacturer of a certain brand of bulb claims that the average lifetime is more
than 1.5 years. State the null and alternative hypotheses to be used in testing this
claim and determine where the critical region is located.
Solution:
The manufacturers claim should be rejected only if is less than or equal to 1.5 years
and should be accepted if is more than 1.5 years. Since the null hypothesis always
specifies a single value of the parameter, we test
H 0 : 1.5 ,
H 1 : 1.5.
Although we have stated the null hypothesis with an equal sign, it is understood to
include any value not specified by the alternative hypothesis. Consequently, the
acceptance of H0 does not imply that is exactly equal to 1.5 years but rather that we
do not have sufficient evidence favouring H1. Since we have a one-tailed test, the
lesser than symbol indicates that the critical region lies entirely in the left tail of the
distribution of our test statistic X .
Testing a population mean
(a)
Test of hypothesis About a Population Mean
normal population and known
large sample and unknown
One-Tailed Test
H0: = 0
H1: > 0 (or H1: < 0)
x 0
Test statistic: z
(for normal population and known) or
n
x 0
z
(for large sample and unknown)
s n
Critical region: z z (or z z ),
where z is the Z-value such that PZ z and s 2 is the sample
variance.
Two-Tailed Test
H0: = 0
H1: 0
Test statistic: z
x 0
Critical region: z z 2 ,
___________________________________________________________________________________
13
Statistics
Using the p-value approach for a Single Mean test (for known)
The probability value, more commonly called the p-value is the smallest
significance level at which the null hypothesis is rejected.
Calculate the p-values:
(i)
x
One tailed-test(left tailed): p P Z
(ii)
(iii)
x
Two tailed-test: p P Z
x
.
P Z x
Example 5.2.2.6
A random sample of 100 electronic chips showed an average lifetime of 2.8 years.
Assuming a population standard deviation of 0.5 year, does this seem to indicate that
the mean lifetime is greater than 2.7 years? Use a 0.05 level of significance.
Solution:
Using test statistic:
1. H 0 : 2.7 years
2. H 1 : 2.7 years.
3. 0.05 .
4. Critical region: z 1.645, where z
x 0
2.8 2.7
2.
0.5 100
6. Decision: Reject H 0 and conclude that the mean lifetime is greater than 2.7 years.
___________________________________________________________________________________
14
Statistics
2.0
Example 5.2.2.7
A manufacturer developed a new type of battery that he claims has a mean lifetime of
10 months with a standard deviation 0.5 month. Test the hypothesis that
10 months against the alternative that 10 months if a random sample of 50
lines is tested and found to have a mean lifetime of 9.8 months. Use a 0.01 level of
significance.
Solution:
Using test statistic:
1. H 0 : 10 months .
2. H 1 : 10 months .
3. 0.01 .
x 0
.
n
9.8 10
5. Computations: x 9.8 months, n 50, and hence z
2.83 .
0.5 50
6. Decision: Reject H 0 and conclude that the average lifetime is not equal to 10
months but is, in fact, less than 10 months.
___________________________________________________________________________________
15
Statistics
p 2
p 2
2.83
2.83
One-Tailed Test
H0: = 0
H1: > 0 (or H1: < 0)
x 0
Test statistic: t
s n
Critical region: t > t (or t t ),
where t is the t value such that P(t > t) = . with n 1 degrees of freedom;
Two-tailed Test
H0: = 0
H1: 0
Test statistic: t
x 0
s
Critical region: t t
with (n 1) degrees of
2
freedom
___________________________________________________________________________________
16
Statistics
Example 5.2.2.8
The height of students in University ABC is normally distributed, and a lecturer
claims that the mean height of these students is 1.68 metres. To test this claim,
another lecturer takes a random sample of 16 students and finds that the mean is 1.71
metres and the standard deviation is 0.05 metres. Can the claim made by the lecturer
be accepted at the 0.05 level of significance?
Solution:
1. H 0 : 1.68 metres.
2. H 1 : 1.68 metres.
3. 0.05 , 0.025 .
2
4. Critical region: t 2.131 or t 2.131 where t
5. Computations: t
x 0
s
6. Decision: Reject H0.
1.71 1.68
0.05 16
x 0
.
s n
( xi x ) 2
.
n 1
i 1
n
2.4 where s
Example 5.2.2.9
Test the hypothesis that the average diameter of certain type of battery produced by a
factory is 10 millimetres if the diameters of a random sample of 10 batteries are 10.1,
9.8, 10.1, 10.5, 10.1, 9.7, 9.9, 10.4, 10.3 and 9.8 millimetres. Use a 0.01 level of
significance and assume that the distribution of diameters is normal.
Solution:
1. H 0 : 10 .
2. H1 : 10 .
3. 0.01 .
4. Critical region: t 3.25 and t 3.25 where t
5. Computations: t
0.281
10
6. Decision: Do not reject H 0 .
x 0
.
s n
( xi x ) 2
.
n 1
i 1
n
___________________________________________________________________________________
17
Statistics
Two-tailed Test
H0: p = p 0
H1 : p p0
Test statistic: z
p p 0
p0 q0
n
Critical region: z z 2
The sample size n must sufficiently large so that the approximation is valid.
___________________________________________________________________________________
18
Statistics
Test statistic: 2
n 1s 2
20
Two-Tailed Test
H 0 : 2 20
H 1 : 2 02
Test statistic: 2
n 1s 2
2
0
2
Critical Region:
or 2 2 ,
2
Example 5.2.2.12
In paper manufacturing, the process that produces papers is considered out of control
if the standard deviation of the weights of a piece of paper exceeds 1.25 grams. A
random sample of 20 pieces of papers taken during a routine periodic check produced
a sample standard deviation of 1.90 grams. At this 0.05 level, is the paper production
process out of control?
Solution:
1. H 0 : 2 1.25 2 .
2. H 1 : 2 1.25 2 .
3. 0.05 .
4. Critical region: 2 2 02.05 30.14 n 1 , where 2
5. Computations: 2
n 1s 2
02
20 11.902
43.8976 .
1.25 2
6. Decision: Since 2 2 , reject H0. Conclude that the paper production process is
out of control.
___________________________________________________________________________________
19
Statistics
5.3
Example 5.3.1
Tossing a fair dice for 180 times and each outcome is noted. Test at the 0.01 level of
significance whether the data obtained from experiment having the discrete uniform
distribution.
Solution:
A table as follows is formed:
Face, X
Observed
Expected
P X x
frequencies, oi
frequencies, ei
1
26
30
16
2
32
30
16
3
25
30
16
4
24
30
16
5
35
30
16
6
38
30
16
Total
180
1
180
The expected frequencies can be found by multiplying the probabilities 1 6 with 180.
1.
2.
3.
4.
5. Computation: 2
i 1
oi ei 2
ei
5.67 .
___________________________________________________________________________________
20
Statistics
Example 5.3.2
Suppose the lifetime (in hours), X for 40 bulbs is recorded and classified into a few
classes as follows:
Class boundaries (hours)
Observed frequencies, oi
1.5 2.0
2
2 .0 2 .5
4
2.5 3.0
11
3.0 3.5
15
3.5 4.0
7
4 .0 4 .5
1
Test at level of significance 0.01, whether the lifetime of bulbs may be approximated
by a normal distribution with 3.2 and 0.5 .
Solution:
A table as follows is formed:
Class boundaries Observed
(hours)
frequencies, oi
1.5 2.0
2
17
2 .0 2 .5
4
2.5 3.0
11
3.0 3.5
15
3.5 4.0
7
8
4 .0 4 .5
1
Total
40
Probabilities
0.0082
0.0726
0.2638
0.3811
0.2195
0.0548
1
Expected
frequencies, ei
0.328
13.784
2.904
10.552
15.244
8.78
10.942
2.192
40
P X 2.0 P Z
PZ 2.4 0.0082
0.5
P X 4.0 P Z
PZ 1.6 0.0548 .
0.5
___________________________________________________________________________________
21
Statistics
The expected frequencies can be found by multiplying the probabilities for each
boundary class with 40.
Note that we combined some of the data so that none of the expected frequencies is
less than 5.
1. H 0 : the random variable X has normal distribution with 3.2 and 0.5 .
2. H 1 : the random variable X does not have normal distribution with 3.2 and
0.5 .
3. 0.01 .
4. Critical region: 2 02.01 2 where 20.01 2 9.210 .
6
5. Computation: 2
i 1
oi ei 2
ei
1.559 .
-end-
___________________________________________________________________________________
22