Escolar Documentos
Profissional Documentos
Cultura Documentos
(STATISTICAL METHODS)
(Lecture Notes)
Introduction
Definition
Statistics - science of the systematic collection, tabulation
presentation, analysis, and interpretation of numerical data
Collection is a process of obtaining numerical data
Tabulation or presentation of data refers to the organization of
data into graphs or charts so that logical and statistical
conclusions
can be derived from the collected data.
Analysis of data pertains to the process of extracting relevant
information from which numerical description can be formulated.
Interpretation of data refers to the task of drawing conclusion
from the analyzed data. It involves sample derived from the
population.
Process of Collecting Data
1. Questionnaire – indirect or survey
2. Interview – direct
3. Registration
4. Observation
5. Experiment
Sampling Methods
1. Lottery
2. Random Number Table
3. Systematic nth
4. Stratified or proportionate
5. Cluster or area
6. Multistage
Areas of Statistics
1. Descriptive Statistics – concerned with the problem of describing
mass of data in a concise, clear, useful and informative way. This
is done by considering such techniques as graphing, tabular
presentation and calculating averages and dispersion.
2. Inferential Statistics – demand a higher order of critical
judgmental mathematical methods. It aims to give information
about a large of data without dealing with each and every
element of the group uses only a small portion of the total set of
data in order to draw conclusions or judgments regarding the
entire population.
Uses of Statistics
Classification of Data
1. One-way classification – has only one variable described by at least two
categories
Example: Civil Status f
Single 20
Married 25
Widow 15
Separated 10
Total 70
2. Two-way classification – there are two variables each described by their
respective categories
Example: Opinion of Respondents in Regard to RH Bill
Opinion
Gender Agree Disagree Don’t Know Total
Male 30 15 5 50
Female 40 10 5 55
________________________________________________
Total 70 25 10 105
N
To determine n, n= where: N = population size
1+ N e 2
e = margin/prob.
of error
Kinds of Sampling
1. Random or probability sampling – methods of selecting sample from the
population where all elements have equal chance of being selected.
Example: N = 1000 n = 100 p = 100 = 1 or 10%
1000 10
Grade 1 - 170
Grade 2 - 130
Grade 3 - 110
Grade 4 - 100
Grade 5- 80
How many pupils from each grade level will be included in the sample if
the margin of error is 2%?
Levels of Measurements
*Nominal Level - we can put the data into categories
*Ordinal Level - we can order the data from the least to the most. Each
data can be compared with another data value
*Interval Level - .We can order the data and also take differences between
data values. At this level, it makes sense to compare the difference
of interval data values.
*Ratio level – We can order the data, take the differences, and the ratio
between data values. For instance it makes sense to say that one
data value is twice as large as another data value.
Exercise No. 1
Mean µ = X1 + X2 + X3 X́ = X1 + X2 + X3
N n
Median - The median of a set of observation arranged in
increasing or decreasing order of magnitude. It is
the middle value if the
the number of items is odd. Arithmetic mean of the
two middle values when the number of
observations is even. It is ordinal statistic.
Mode - Value that occur most often or value with greatest frequency.
It may or may not exist. It is a nominal statistic. If there are two modes
the observation is bimodal.
2. Measures of Variation
The three measures of central location do not by themselves give an
adequate description of our data. We need to know how the observations
spread out from the mean.
2a. Range – difference between the largest and smallest measure. It is a
poor measure of variation if the size of the sample or population is
large. It considers only the extreme values and tells nothing
about the distribution of data in between.
2b. Variance – It is a measure of variation/deviation from the mean. An
observation greater than the mean will produce positive deviation,
whereas, an observation smaller than the mean
will produce negative deviation. It is the average of the squares
of the deviations of individual values from the mean.
2c. Standard Deviation – positive square root of variance. It is a special
form of average deviation from the mean, it is the positive square root of
arithmetic mean of the squared deviations from the mean. It is the
measure of heterogeneity and unevenness within the set of
observations.
µ
X −¿
Population Variance , 2
= ¿
σ
¿2
∑¿
¿
X
X −´¿
Sample Variance, S2 = ¿
¿2
∑¿
¿
Problem illustration:
7−7
5−7 9−7
8−7
σ
2
= ∑{( ¿ + ¿ +( 2 7+(6-7)2
¿ ¿ ¿ ¿
¿
¿ ¿2 +¿
6
= 1.67
Problem Exercises:
1. Compute for the mean, median, variance, and standard deviation of the
following scores of 12 students in Math 1 E
20, 16, 19, 17, 18, 14, 15, 13, 11, 10, 12, 19
2. Two samples of bottled fruit juices are on display. One bottled by A and the
other by B. If you are to chose between A and B fruit juices, which one is
your choice?
Sample A: 0.95 li. 1.0 li. 0.93 li. 1.02 li. 1.10 li.
Sample B: 1.06 li. 1.01li. 0.88 li. 0.91 li. 1.14 li.
Calculate the following: (a) mean (b) median (c) mode ( d) variance
(e) standard deviation
Sample A: 0.95 li. 1.0 li. 0.93 li. 1.02 li. 1.10 li. ´
XA = 1.0 li.
Sample B: 1.06 li. 1.01li. 0.88 li. 0.91 li. 1.14 li. ´
XB = 1.0 li
Calculate the coefficient of variations.
B. Problem Exercise: The following data give the ages of sample of Grade 3 and
Grade 4 pupils of A-1 Elem. School
Grade 3 7 8 8 9 7 8 9 8 7 8
Grade 4 9 10 8 9 10 10 9 7 8 10
Calculate the following: Mean, Median, Mode, Variance, and Coef. of
Variation (CV), Which group of pupils is more
Variable
Example. Suppose a set of data has a mean of 150 and std. deviation of 25. By
1
Chebyshev’s theorem, the probability is at least 1 - = .75 that X will take
K2
on a 50, the probability is at least 75% that X will take on value between 150 –
50 and 150 + 50 or 100 - 200. Consequently, we can say that at least 75 % of the
values are found between 100 and 200. The specified number of standard
deviations(KSD) of the mean, X ± KSD.
3. Z- Score measures the number of std. deviations the variable X is from the
mean. It is a measure of relative location of the observation in a data set.
Observations in two different data sets with the same Z-score can be said
to have the same relative location in terms of the same number of
standard deviations from the mean.
X−µ X− X́
Z= or Z= If Z –score is positive, the
σ S
Problem Illustration.
Let us assess the accomplishment of a student in Math 1 and Science 1.The
student’s score in Math 1 is 82 and 89 in Science 1. Can we conclude that the
student is better in Science 1 ?
Soln. We should consider how the student performed relative to other students
in each class. For Math 1 the mean score is 68 with SD of 8 while in Science 1
class, the mean score is 80 with SD of 6.
Math 1: µ = 68 σ =8 X = 82 Z= X – μ /¿ σ Z = 82-68 / 8 =
+1.75
Science 1 : μ=¿ 80 σ =¿ 6 X= 89 Z= X– μ/ σ Z = 89-80/ 6 =
+1.50
Since the Z-score of student in Math 1 is larger than his Z-score in Science 1, the
student’s relative performance in Math 1 is better than his performance in
Science 1.
Conclusion: The student is a better student in Math. 1
Problem Exercises:
1. Let us assess the encoding speed of an applicant whether he is suited for
Dean’s Office, Bus. Office, or Personnel Office.
2. Given two college students who took final exams. in Sociology from two
different professors their scores are as follows:
John’s Class Bill’s Class
Class mean score 75 88
John’s score 85 Bill’s score 92
Std. Dev. 6.5 4.0
6. Suppose a set of data has a mean of 150 and std. dev. of 25. Between what
Values X will take on if K = 2; K = 3; K= 4
10 – 14 2
15 - 19 4
20 - 24 9
25 - 29 6
30 – 34 5
I I
I I upper class limit
I lower class limit
(1) Range = H – L
(2) Decide on the number of class intervals. Use Sturg’s rule unless the number of
C.i. is specified . No. of c.i. = 1 + 3.32 log10N
(3) Class width, W = Range/ no. of c.i. , should have the same number of signifi-
cant places as the data measures
(4) Construct the frequency table with the first class interval beginning with
lowest measure
(5) Tabulate
Problem Exercises
1. Construct the frequency distribution table for speeds in KPH of n cars plying
the downtown area
15 20 26 31 34 16 32 18
18 25 25 33 33 17 28 22
20 23 30 35 25 19 26 29
21 24 28 30 20 22 24 33
22 27 32 31 25 24 25 30
15 20 26 31 34 16 32 18 19 26
18 25 25 33 33 17 28 22 20 33
20 23 30 35 25 19 26 26 24 31
21 24 28 30 20 22 24 33 30 28
22 27 32 31 25 24 25 30 22 23
Calculate for: (a) Mean (b) Median (c) Mode (d) Variance
(e) Standard Deviation
(3) Mo = LCB + [ d1 ] w
d1+d2
2
(5) Standard Deviation, S = √S
Illustration:
Minimum Monthly Salary in P103 of selected professionals in Metro
Bacolod
Problem Exercise:
Frequency Distribution of Battery Lifetime
Class interval f CM LCB LCF fCM fCM 2
1.5 - 1.9 2
2.0 - 2.4 1
2.5 - 2.9 4
3.0 - 3.4 15
3.5 - 3.9 10
4.0 - 4.4 5
4.5 - 4.9 3
Complete the table and calculate for : Mean , Median, Mode, Variance, Std.
Deviation
PROBABILITY
Approaches to probability:
1. Classical Approach -most often associated with gambling and games of
chance
2. The Relative Frequency Approach - uses past data that have been empirically
observed. It notes the frequency with which some events has occurred in the
past and estimates the probability of its reoccurrence on the basis of historic
data. Illustration: The price of oil which changes almost weekly
Example: PE = No. of times the event has occurred
Total observation
Declined = 20 days P declined = 20/100
= 20%
Increased = 50 days P increased = 50/100
=50%
Unchanged= 30 days P unchanged=30/100
=30%
Example: The probability that a woman will be elevated to the status of Pope
. Are there data to rely on?
Subjective Probability (or Personal Probability)- By classical definitions, we can
find the objective probability values which indicates the relative rate of
occurrence of the event in the long process involved,
but these outcomes are based on actual observations and not on prior
outcomes. In many cases, either only a small number of past outcomes of the
events may be available or these may not be available. There may not be any
past outcomes to examine as in the case of marketing a new product. In this case
probability of success depends on PERSONAL JUDGMENT
Theorems of Probability
Rules of Probability
1. Additive Rule – applies to union of events
1a. Mutually exclusive events where no two events can occur together.
Such events do not have any common outcomes. If two or more events
are mutually exclusive, then at most one of them will occur. Mutually
exclusive events are disjoint events.
Example: If A and B are mutually exclusive, then the occurrence of one event
excludes the occurrence of the other event. The probability of occurence of
either A or B is the sum of the individual probability of A or B
PA or B = PA + PB
1b. Non-mutually exclusive events. The addition theorem will no longer apply.
When events are not mutually exclusive, then the probability that at least
one of two events A and B will occur. Such events have
common elements.
PA or B = PA + PB – PAB
Example: What is the probability of drawing an Ace or
Diamond from a deck of playing cards?
N= 52 cards A = 4 D = 13 AD = 1
P A or D = PA + PD – PAD
4/52 + 13/52 – 1/52 = 16/52
2. Multiplication Rule
2a. Independent Events – the occurrence of one event does not affect the
probability of occurrence of the other event
PA and B = PA x PB
Example: Tossing of a coin twice
1st toss : PH1=1/2; PT1 = 1/2
2nd toss: PH2=1/2; PT2 = 1/2
PH1 and H2 = ½ x ½ = ¼
PT1 and T2 = ½ x ½ = ¼
PH1 and T2 = ½ x ½ = ¼
PT1 and H2 = ½ x ½ = ¼
____
PT = 4/4 = 1.0
2b. Dependent events or conditional probability – the occurrence of
one event is conditioned by the occurrence of the other event
PA and B = PA x PB/A ( Event B conditioned by A)
or PB x PA/B (Event B conditioned A )
Problem Exercise: There are 7 non-defective and 3 defective items in a box.
If two items are drawn one after the other, what
is the probability of
N1N2, N1D2 , D1N2, D1D2
Given: N = 7
D=3
PA + Pnot A = 1.0
Problem Exercises:
1. 1a. Rolling 2 dice, d1 and d2, what is the probability of getting a total of
7 or 11
1b.What is the probability of getting a total of 7 or die 1 is less than 4
when a pair of dice is tossed?
1c. What is that a throw of two dice less than 8
or equal to 8.
2.In a certain freeway, your chance of getting off at Exit 1 is 60%. If you
get off at Exit 1 your chance of getting lost is 30%. If you miss Exit 1 and
have to get off at Exit 2, your chance of getting lost is 70%. What is
the probability getting lost? not lost?
3.Two cards are drawn from a deck. Find the probability that an Ace and a
king will appear if the first card is (a) returned (b) not returned
8. One bag contains 4 white balls and 3 black balls, and a second bag
contains 3 white balls and 5 black balls. One ball is drawn from the first
bag and placed unseen in the second bag. What is the probability that
ball now drawn from the second bag is black?
9. A town has one fire engine and one ambulance available for
emergencies. The probability that the fire engine is available at 95%,
the ambulance is available when called is 90%. In the event of an injury
resulting from a burning building, (a) find the probability that both the
fire engine and the ambulance are available, (b) the fire engine
is available and the ambulance is not.
10. The probability that the lady of the house is at home when
the Avon representative calls is 0.6. Given that the lady of
the house is at home, the probability that she makes a purchase is
0.4. Find the probability that the lady of the house is home and
makes a purchase when the Avon representative calls?
11. Two cards are drawn from a deck. Find the probability of drawing an
Ace and a King if the first card drawn is (a) returned to the deck
(b) not returned to the deck
Example. In how many ways may two girls and two boys be
seated in a row of four seats.
There are 720 ways in which 1st, 2nd, and 3rd places will
be selected by the judges from 10 contestants. Given
each contestant has an equal likelihood of being selected
for each three places, there is a probability of 1/720 that
any contestants will finish in a specific order. P= 1.33 x 10 -3
Problem Exercises:
1. How many groups of five students can be drawn in any order from 7
students.
2. The President of the A-1 Company must select four of her six vice
presidents to handle problems when they arise. How many different
arrangements of vice presidents can the president devise.
3. Five lifeguards are available for duty one Saturday afternoon. There are 3
lifeguard stations. In how many ways can three lifeguards be chosen and
ordered among the stations.
4. A company has hired 15 new employees, and must assign 6 to the day
shift, 5 to the graveyard shift, and 4 to the night shift. In how many ways
can the assignment be made?
5. How many different ways can 3 red, 4 yellow and 2 blue bulbs be arranged
in a string of Christmas tree lights with 9 sockets.
6. From 5 male and 4 female engineering students, find the number of
committees of 4 that can be formed with three males and one female?
7. Five manufacturers produce a certain electronic device whose quality
varies from manufacturer to manufacturer. If you are to select three
manufacturers at random, what is the chance that the selection would
contain exactly two of the best three.
15!
P(x=5) = (.10)5(.90)15-5 = .0105 or 1.05%
5 ! ( 15−5 ) !
Problem Exercises:
1. A salesman for a local distributor of housewares makes 10 house
calls a day. The probability that he makes a sale in any given day is
20%. What is the probability that today he will,
a. not make any sale
b. at least one sale
c. 2 sales
2. The Bus. office revealed that 10% of the college students cannot
settle their accounts at the end of the semester. What is the
probability that of the 15 students selected at random,
a. 5 cannot settle their accounts
b. none cannot settle their account
c. at least one cannot settle their accounts
d. at most 2 cannot settle their account
3. A company makes color tv’s 5% of which are defective. Ten color
tv’s are shipped to the dealer. If each tv set assembled are
considered an independent entity, what is the probability that the
shipment of tv’s contain,
a. no defective tv
b. at least one tv is defective
c. less than 2 tv’s are defective
d. exactly two tv’s are defective
4. Record shows that A-1 Computer is up and running 80% of the
time. Bert, the resident computer jock of A-1 examines the
computer nine times. What is the probability that it is functional
3 times? All will be functional in the nine times Bert inspects the
computer?
5. Five persons are required to operate a chemical process, the
process cannot be started until all 5 work stations are manned.
Employee’s record indicates that there is a 40% chance of any one
coming late and they all come independently. The management is
interested of knowing the probability of anyone coming late so
that back-up personnel can be arranged. What is the probability
of, (a) none coming late (b)1 or 2 or 3 coming late
x
λ
Px = λ Where: X = Poisson random variable with
e X!
Possible values of 0, 1, 2……10
λ = mean number of successes in a
given
time interval or region of space
e = base of nat log = 2.7183
PX< 7 = Po + P1 + P2 + P3 + P4 + P5 + P6
Problem Exercises:
Favor 58 63 27 148
Against 42 77 33 152
* Prob. the voter favors RH Bill = 148/300 (2).Joint prob. of two independent events =
Observed frequency where the two events meet
Grand Total
* Prob. the voter is against RH Bill and belongs to middle income group=77/300
(3).Conditional prob. of two dependent events= observed frequency where 2 events meet
Subtotal of conditioning event
* Prob. the voter belongs to low income given against RH Bill = 42/152
(4).Prob. of non-mutually exclusive events =Add the prob. of each event and subtract the
joint probability
* Prob. the voter belongs to middle income or against RH Bill= 140/300 +152/300 – 77/300
Exercises: 1. Prob. the voter belongs to high income and favors RH Bill
2.Prob. the voter belongs to low income given favors RH Bill
3. Prob. the voter belongs to middle income or favors RH Bill
4. Of the 1000 employees of A-1, Inc. 300 are males and the rest females. A total of 130
Males and 200 females are in favor of the proposed salary scale. What is the probability
that an employee selected at random,
4a. is a male and favors the proposed salary scale
4b. is a female given she is against the proposed salary scale
4c. is a male or against the proposal
4d. both favors the proposal
MULTINOMIAL DISTRIBUTION
The binomial becomes a multinomial if we let each trial have more than two
possible outcomes. Theorem: If a given trial can result in K possible outcomes of
events, E1, E2 Ek with probabilities P1,P2,Pk,, then the probability distribution of random
variables X1 X2 Xk representing the number of occurrences for events E1 E2 Ek in n
independent trials, is
n!
P(X1,X2,Xk; P1,P2,Pk, n) = P X1P X2P Xk
X 1 ! X 2 ! Xk ! 1 2 k
Prob. Exercise 1. The probabilities are 0.40,0.20,0.30 and 0.10, respectively that
a delegate to a certain convention arrived by air, bus, car, or train. What is
the probability that among the 10 delegates selected at random at this
convention, 3 arrived by air, 1 by bus, 4 by car, and 2 by train.
3.The probabilities that the cars for rent at a local airport are available are:
20% Honda, 25% Toyota, 15% Nissan, 18% Hundai, and 22% Ford. The
proprietor of a car rental agency randomly selects
10 cars to chauffer delegates from the airport to the convention center.
Find the probability that
2 Ford, 3 Nissan, 2 Honda, 1 Hundai and 2 Toyota cars are available.
HYPERGEOMETRIC DISTRIBUTION
Probability of selecting X “successes” from K items and n-x from N-K items
labeled as “failure”, when a random sample of size n is selected from a finite
population of size N.
Properties of hypergeometric experiment
1. A random sample of size n is selected from a population of N items
2. K of the N items may be classified as successes and N-K classified as
failures
3. X = hypergeometric random variable
kCx N−kCn−x
P(x; N; n;k) = .
NCn
μ = n(k/N) mean
2
σ =n(k/N)(N-k/N)(N-n/N-1)
( a 1Cx 1 ) ( a 2Cx 2 ) (a 3 Cx 3)
P(x1; x2;xk; a1;a2;ak; N, n) =
NCn
Prob. Exercise:
A homeowner wishes to plant different kinds of vegetables in his vegetable
garden. He selects 5 to be planted in his garden. The box contains 5 lettuce
and 4 tomatoe seedlings. What is the probability that he will plant 2 lettuce
and 3 tomatoe seedlings.
The Empirical Rule – regardless of the value of the mean and standard deviation,
X−μ X− X́
Z= or Z=
σ s
Where: Z is the normal deviate and X is some specified value for the random variable.
Z measures the number of standard deviations an observation is from the mean.
After the conversion process, the mean of the distribution is 0 and SD is 1
Illustration: Telecom, a telephone answering service for business executive in Metro Bacolod
has found that the average telephone message is 150 sec. with SD of 15 sec. The
length of message is normally distributed a particular phone message took 180
seconds. How many seconds is it longer than the average?
X−μ 180−150
Solution. Z = Z = = 2 SD’s or 30 seconds above the mean
σ 15
What is the probability that the single message takes between 150 sec. and 180 sec.
180−150
Solution: Z = = 2 From normal curve table, Z = +/- 2.0 is equivalent to .4772
15
Telecom concludes that there is 47.72% chance that any single telephone message will
last 150 sec to 10 sec.
Exercises: Find the area under the normal curve. Use the normal curve table
Problem Exercises:
1. A random sample of 1000 construction workers gave their average daily wage at
P420 with SD of P35. Assuming that daily wages to be normally distributed,
a. what is the probability that a worker selected at random earns between
P490 and P380 a day?
b. how many workers earn less than P450 a day?
c. if workers who earn P480 and above are asked to contribute P70 for a sick
co-worker, how much is the expected contributions?
2. A study of prevailing market prices for one day shows that the average price of rice
per kilo is P40.00 with SD of P1.50. Assuming that prices are normally distributed,
a. What percentage of rice sells at higher than P43.00 per kilo?
b. If in a particular market, 1000 sacks of rice were sold, how many kilos were sold
at less than P42.00 per kilo? (1 sack = 50 kilos)
c. What average price per kilo should the government try to maintain so that
80% of rice sells at not more than P42.50 .
3. The average life of a certain type of motor is 10 years. The manufacturer replaces
free all motors that fail while under guarantee. If he is willing to replace only 3%
of the motors that fail, how long a guarantee should he offer? SD is 2 years.
4. Assume that heights of women in a population follow a normal curve with mean of
64.3 inches and SD of 2.6 inches.
a. What proportion of women stand between 60 inches and 66 inches?
b .A certain woman has a height of 0.5 SD above the mean., What proportion
of women are taller than she ?
5. A distribution of test scores in Statistics follow a normal distribution with mean of
80 and std. deviation of 12. There are 120 students who took the test.
a. How many scores do you expect to find above 100
b. How many scores do you expect to find between 90 and 110
6. A soft drink machine is regulated so that it discharges an average of 200 ml
per cup. If the amount of drink is normally distributed with standard deviation
of 15 ml,
a. What fraction of the cup will contain more than 225 ml?
b. How many cups will likely overflow if 235 ml cups are used in the next
1000 drinks?
c. Below what value do we get the smallest 20 % of the drinks?
The values of the coefficient of correlation are between -1.0 and +1.0
If r is +1.0, it indicates that the two variables are perfectly related in a
a positive sense which means, if X increases, Y also increases; if X decreases,
Y do likewise. If r is negative, it indicates that X and Y are not linearly related,
meaning, if X increases, Y decreases.
r = 0.00 No correlation
r from +/- .01 to +/- .19 Negligible correlation
r from +/- .20 to +/- .39 Low Correlation
r from +/- .40 to +/- .59 Moderate Correlation
r from +/- .60 to +/- .79 Moderately High Correlation
r from +/- .80 to +/- 1.00 High Correlation
Techniques of Correlation
6 ∑ d2
r = 1- Where: 6 = constant
N ( N 2−1)
N = no. of pairs
d = difference between ranks
Example: Relationship between Height and Weight of persons
∑ dy2
r = √ (∑dx )(¿)
2
Where: dx = X - X́ dy = Y - Ý
∑(dx . dy)
¿
Illustration:
(1) The relationship between AGE of machines and its REPAIR costs
Candidates: A B C D E F G H I J
Judge X : 2 3 1 6 4 5 10 7 8 9
Judge Y : 2 1 3 5 6 4 8 10 9 7
Advantages:
1. Rank Order – provides a convenient way of estimating coef. of correlation if
N is small
2. Pearson Product-Moment(PPM) – takes into account the absolute size of
the measures and not merely their rank position
Coefficient of Determination, r2 expresses the proportion of total variation in Y that
can be accounted for or explained by the independent variable X.
Thus, r = .60, r2 = .36 meaning, 36 % of the variation in Y is accounted for by X
Regression Analysis
X
∑¿
¿
where: b= N ∑ X 2−¿ a= Ý - b X́
N ∑ XY −(∑ X)(∑Y )
¿
thus, Y^ = a + bX ± SEest
SEest is the standard error of estimate. It measures the disperse on about an average
line called regression line.
SEest =
√ ∑ Y 2−a (∑ Y )−b (∑ XY )
n−2
5 3.5
6 5.0
7 5.0
9 5.2
12 6.0
13 6.0
15 6.2
16 7.1
a. Determine the coefficient of correlation, r
b. Estimate the repair cost of a ten- year old machine
2. The data below represent the electrical energy consumption in KWH and the
amount due over a period of 6 months.
3. Data below represent the supply of product A and its price per unit
4. The marketing research dept. of A-1, Inc. wanted to study the relationship
between the Advertising expenditures and Sales volume of a certain product.
Estimate the sales volume for an advertising expense of P50,000
5. Relationship between the teaching performance and tenure in years of ten teachers
Teaching Tenure in
Performance, % Years
84 4
86 6
90 14
87 8
92 15
94 12
95 16
80 5
85 7
88 10
Problem illustration:
Mr. de los Santos has been concerned for sometimes with the overhead
costs in his furniture shop. For the last 7 months he has kept a record not only of the
direct labor hours in the shop but also the total costs of lumber used in the operation.
The data are found in the following table:
TESTS OF HYPOTHESIS
Kinds of Hypothesis
4. Non directional hypothesis does not specify the direction of relationship between
variables. It merely states the presence or absence of a relationship between two
variables or that one variable influences the other variable.
Example: There is significant difference between the performance rating
of students who attended the review class and those who did not
A ≠ B ; X≠ Y
Level of Significance, α
The significance level of a test is the maximum value of the probability of
rejecting the null hypothesis when in fact it is true.
5% level of significance implies that you are 95% confident that you have
made the right decision of accepting or rejecting the hypothesis.
Tests of Hypothesis
1. Z-test is used when the population standard deviation is known. Sample size is 30 or
more (n is 30 or more ,Walpole 3rd ed)
2. t-test is used when sample standard deviation is known. Sample size is less than 30
Uses of Z – test:
X́−µ
Z= √n µ = population mean X́ = sample mean
σ
n = sample size σ = Population standard deviation
Note: For Z – test, use table of critical values of Z based on the area under the
normal curve.
√
P 1Q 1 P 2Q 2
+
n1 n2
Z= ¿ P2 = proportion of 2nd
P1−P 2
¿
sample ,Q2=1-P2
Accept Ho if the absolute value of the computed Z is less than the table value,
reject Ha
Reject Ho if the absolute value of the computed Z is equal or greater than the table
value. Accept Ha
Uses of t-test:
X́−µ
t= √ n−1 S = sample standard deviation
s
√
1 1
2( + )
n1 n 2
s 12 +¿n 2 s 2
¿ n1
t = n 1+n 2−2
´X́ 1− X́ 2
¿
X 1−X 2
t=
√ ∑ d2
n(n−1)
Problem Exercises:
1. Two types of wires are being compared for strength. Fifty pcs. of each type of wires are
tested under similar conditions. Type A has an average tensile strength of 78.3 Nt, while
type B has tensile strength of 87.2 Nt. The combined standard deviation of wire is 5.6
Nt. Which type of wire is stronger. Test at 1% alpha.
2. A sample survey of TV program in Metro Bacolod shows that 80 of 200 men prefer
watching NBA. From another group of sample, 75 of 250 prefer watching PBA. What is
the preference of men? Test at 5% alpha.
3. A cigarette manufacturer claims that his cigarettes has an average nicotine content of
1.83 mg. and standard deviation of .11 mg. If a random sample of 50 cigarettes of this
type has an average nicotine content of 1.90 mg , will you agree with the claim of the
manufacturer? Test at 1% alpha.
4. A researcher wishes to find out whether or not there is a significant difference between
the weekly pay of night and day shiftees of a certain company. By random sampling, she
selected 25 day and 27 night shiftees and computed their mean weekly pay and
standard deviations. The day shiftees has a mean weekly pay of P1575 with standard
deviation of P55. The night shiftees has a mean weekly pay of P1850 and standard
deviation of P65 . Do the night shiftees earned significantly higher than the day
shiftees? Test at 5% alpha.
1 2.0 1.9
2 2.0 1.9
3 2.3 2.0
4 2.1 2.1
5 2.4 2.3
6. Data from the subdivision survey shows that the average monthly electrical
consumption of residential homes is 150 KWH with standard deviatioin of 18 KWH. A
sample of 70 residential homes were selected randomly and were found to consume on
the average 190 KWH . Are the 70 residences consuming significantly more than the
rest? Test at 2.5% alpha.
7. Alpha Company manufactures steel cable with an average tensile strength 150 Nt.
The laboratory tested 30 pcs. and found to have average tensile strength of 145 Nt.
and standard deviation of 6.5 Nt. Is the result of the laboratory in accordance with
8. Two types of thread are compared for strength. Twenty five pieces of each type of
thread are tested under similar conditions. Type X has an average strength of 78.3
Kg, and standard deviation of 5.6 Kg. Type Y has an average strength of 87.2 Kg
and standard deviation of 6.2 Kg. Which type of thread is stronger? Test at 1%
level of significance.
9. A manufacturer of light bulbs claims that his light bulbs burn on the average
500 hrs. To maintain this average, he tests 25 bulbs each month. If the computed
t value falls between -to.o5 and t.o5, he is satisfied with his claim. What conclusion
should he draw from a sample that has a mean 518 hours and standard deviation
40 hrs. Assume the distribution of burning times to be approximately normal.
10. The television picture tubes of manufacturer A have a mean lifetime of 6.5 years and
standard deviation 0.9 year, while those of manufacturer B have lifetime of 6.0 years
and standard deviation of 0.8 year. What is the probability that a random sample of 36
tubes from manufacturer A will have mean lifetime that is at least one year more than
the mean lifetime of a sample of 49 tubes from manufacturer B.
Chi Square is used as a test of significance when data are expressed in frequencies
or data are in terms of percentages or proportions and that can be reduced to frequency.
The applications of chi square are with discrete data; however, any continuous data may
be reduced to categories and the data so tabulated that chi square may be applied.
Example: Scores on a test of mental ability and dexterity test could be tabulated into
a contingency table.
Dexterity Test Score
To use the Chi square statistic, the data must be independent, i.e., no response
is related to any other responses. Also the categories into which data are placed must be
mutually exclusive, i.e. the frequency must be placed in one and only one category. And
finally, all data must be used. All the observed data must be used in a chi square problem.
Classification of Data
1. to test the “goodness of fit” to a normal curve, i.e., to find out whether or not a
sample distribution conforms to hypothetical/ideal distribution
Example: Tossing of a coin 10 times
fo fe
Head 4 5
Tail 6 5
fo fe
Reduced in weight 18 22
Did not reduce in weight 12 8
Status
School Hired Not hired Total
Problem Examples:
1.In a public opinion poll conducted on attitude towards women in the military
were sampled. Some 113 subjects were interviewed. The question asked was
“Do you favor women in the military?” Test at 2% alpha
Attitude
Male 30 20 5 55
Female 38 18 2 58
__________________________________________________
Total 68 38 7 113
3. In 100 tosses of a coin, 63 heads and 37 tails are observed. Is this a balanced coin?
Test at 1% alpha.
With hypertension 21 36 30
No hypertension 48 26 19
_________________________________________________________
4. A marketing research department has divided a certain sales region into six
districts. It is believed that all districts have the same sales potentials. The number
of units sold in specified districts are given. Test whether or not the six districts
have equal sales potential. Test at 2% alpha
1 12
2 18
3 15
4 25
5 22
6 28
Note: Chi Square for a 2x2 table with df = (2-1)(2-1) = 1 without computing for fe
A c k
b d l
m n N
ad−bc
¿
¿
χ 2= ¿2
N¿
¿
Note: For other types of table, say 2x3, 3x3, 3x4, 4x5, 5x5
df = (row -1)(column -1)
fe =( RxC )/N
Problem Exercise
A study was conducted to determine the relationship between sales and location
of fast food.
Sales
ANALYSIS OF VARIANCE
(ANOVA)
Analysis of Variance is a parametric test which is widely used and highly developed statistical
methods for comparing three or more means
Assumptions Underlying the Use of ANOVA
1. The individuals in the group and subgroups are selected randomly from a normally
distributed population
2. The samples that constitute the groups are independent
Sources of Variation:
CAR XA XB Xc X2 A X2B X2 C
1 12 18 6 144 324 36
2 18 17 4 324 289 16
3 16 16 14 256 256 196
4 8 18 4 64 324 16
5 6 12 6 36 144 36
6 12 17 12 144 289 144
7 10 10 14 100 100 196
_________________________________________________________________
Total 82 108 60 1068 1726 640
X́ 11.71 15.43 8.57
Ho: No significant difference between the distance traveled by cars with one liter of
gasoline
Ha: There is a significant difference between the distance traveled by cars with one
liter of gasoline
ANOVA TABLE
Source of Variation Sum of Squares df Mean Square
Between groups SSB = 165.01 dfB = c-1=3-1=2 MSSB = SSB/dfB
Within groups SSw = 292.8 dfw = dft-dfB = 165.01/2 = 82.50
= 20-2=18 MSS w= SSw/dfw = 292.8/18
Total variance TSS = 457.81 dfT= N-1 =21-1=20 =16.25
F-test = MSSB/MSSW
= 82.50/16.25
= 5.06
F-test is interpreted with the use of F table
At 5% alpha, F- table value is 3.55
At 1% alpha, F- table value is 6.01
Thus, at 5% alpha, F-test = 5.06 > 3.55, Ho: Rejected, Ha: Accepted
and at 1% alpha, F-test = 5.06<6.01, Ho: Acceptd, Ha: Rejected
Conclusion at 5% alpha:
Conclusion at 1% alpha:
Problem Exercises:
Trainees Raters
A B C D
1 10 6 8 7
2 4 5 3 4
3 8 4 7 4
4 3 4 2 2
5 6 8 6 7
6 9 7 8 7
(2) In an experiment designed to compare the effects of coaching on the scores obtained on
an aptitude test used for entrance to a professional school, three levels of coaching were
used: none, 4 hours, and 12 hours. A random sample of 18 applicants was chosen from
the population of applicants. Their scores are given below: Test at 5% alpha.
Scores Obtained
1 30 32 35
2 27 30 33
3 26 29 32
4 24 27 30
5 22 25 27
6 20 24 26
State: Ho: Coaching has no significant effect on the aptitude test score of applicants to
professional schools
Ha: Coaching has significant effect on the aptitude test score of applicants to
professional schools