Você está na página 1de 42

* PROBABILITY AND STATISTICS*

(STATISTICAL METHODS)
(Lecture Notes)

Introduction

Definition
Statistics - science of the systematic collection, tabulation
presentation, analysis, and interpretation of numerical data
Collection is a process of obtaining numerical data
Tabulation or presentation of data refers to the organization of
data into graphs or charts so that logical and statistical
conclusions
can be derived from the collected data.
Analysis of data pertains to the process of extracting relevant
information from which numerical description can be formulated.
Interpretation of data refers to the task of drawing conclusion
from the analyzed data. It involves sample derived from the
population.
Process of Collecting Data
1. Questionnaire – indirect or survey
2. Interview – direct
3. Registration
4. Observation
5. Experiment
Sampling Methods
1. Lottery
2. Random Number Table
3. Systematic nth
4. Stratified or proportionate
5. Cluster or area
6. Multistage

Areas of Statistics
1. Descriptive Statistics – concerned with the problem of describing
mass of data in a concise, clear, useful and informative way. This
is done by considering such techniques as graphing, tabular
presentation and calculating averages and dispersion.
2. Inferential Statistics – demand a higher order of critical
judgmental mathematical methods. It aims to give information
about a large of data without dealing with each and every
element of the group uses only a small portion of the total set of
data in order to draw conclusions or judgments regarding the
entire population.

Uses of Statistics

Statistics has general applicability. It is an essential tool in education,


government, business, economics, medicine, psychology, sociology, industry,
and sports. To mention a few,
1. Education – statistical tools are used to get information on enrolment
and physical facilities. Such data are needed for intelligent administration
and management of academic institution.

2. Government – Data on population, manpower and material strength of


the nation. Data on movement of population, cost of living, taxes, wages,
and material resources are needed for policy making and administration.
3. Psychology – Data on IQ test, admission tests, aptitude test, personality
traits ratings, attitude test, helped psychologists to understand human
person better.
4. Sociology – Statistics is used in the study of the conditions of the society
in which man lives. Observations when properly analyzed and interpreted
may effect positive action toward the improvement of society.

Classification of Data
1. One-way classification – has only one variable described by at least two
categories
Example: Civil Status f
Single 20
Married 25
Widow 15
Separated 10
Total 70
2. Two-way classification – there are two variables each described by their
respective categories
Example: Opinion of Respondents in Regard to RH Bill
Opinion
Gender Agree Disagree Don’t Know Total
Male 30 15 5 50
Female 40 10 5 55
________________________________________________
Total 70 25 10 105

Sample and Population:

Population (N) – totality of observations


Sample (n) - a subset of a population. It is used when it is impossible or
impractical to observe the entire population

N
To determine n, n= where: N = population size
1+ N e 2
e = margin/prob.
of error
Kinds of Sampling
1. Random or probability sampling – methods of selecting sample from the
population where all elements have equal chance of being selected.
Example: N = 1000 n = 100 p = 100 = 1 or 10%
1000 10

2. Non-random or non-probability sampling – method of selecting sample


from population where not all elements are given equal chance of being
selected others are deliberately left out.
Example: Selecting a qualified person from among the applicants
Problem Exercise:
The number of pupils enrolled in five (5) grade levels of A-1 Elementary
School are as follows:

Grade 1 - 170
Grade 2 - 130
Grade 3 - 110
Grade 4 - 100
Grade 5- 80

How many pupils from each grade level will be included in the sample if
the margin of error is 2%?

Levels of Measurements
*Nominal Level - we can put the data into categories
*Ordinal Level - we can order the data from the least to the most. Each
data can be compared with another data value
*Interval Level - .We can order the data and also take differences between
data values. At this level, it makes sense to compare the difference
of interval data values.
*Ratio level – We can order the data, take the differences, and the ratio
between data values. For instance it makes sense to say that one
data value is twice as large as another data value.

Exercise No. 1

1. Kim and James are students of this university


2. In the high school graduating class of 320 students, John ranked 4th,
Joe ranked 10th where 1 is the highest rank
3. The temperature of Bangos pond varies from 30 deg. C to 38 deg. C
4. A certain NBA basketball player is 4 inches taller than a PBA
basketball player.
STATISTICAL MEASURES
Parameter and Statistic

Parameter – any numerical value describing the characteristics of a population


μ = population mean – totality of objects; universe
Statistic – value computed from a sample i.e. numerical value
describing the characteristics of a sample
X́ = sample mean
1. Measures of Central Location for ungrouped data
Describes a group of data be it population or sample. It indicates the
center of a set of data arranged in an increasing or decreasing order of
magnitude.
*Average is a measure of the center of set of data when it is arranged in
increasing order of magnitude.
1a. Mean – interval statistic
1b. Median- ordinal statistic
1c. Mode – nominal statistic
Population Sample

Mean µ = X1 + X2 + X3 X́ = X1 + X2 + X3
N n
Median - The median of a set of observation arranged in
increasing or decreasing order of magnitude. It is
the middle value if the
the number of items is odd. Arithmetic mean of the
two middle values when the number of
observations is even. It is ordinal statistic.

Example: Nicotine content of cigarettes in mg


1.9, 2.3, 2.5, 2.7, 2.9, 3.1

Md = 2.5 + 2.7 = 2.6 mg i.e. 50% of cigarettes contain less


2
than 2.6 mg and the remaining 50%, more than 2.6 mg

Mode - Value that occur most often or value with greatest frequency.
It may or may not exist. It is a nominal statistic. If there are two modes
the observation is bimodal.

Characteristics of the measures of central location

1. Mean – most commonly used measure of central location.


Easy to calculate and it employs all available information. Its
disadvantage is that it may be affected adversely by extreme values.
It is more stable if sample mean is calculated from population.
2.Median – Easy to compute if the number of observation is
relatively small. It is not influenced by extreme values. Must be
arranged in increasing or decreasing order of magnitude.
3.Mode- Least used measure. For small set of data its
value is almost useless. Only in large mass of data does
mode have a significant meaning.

Exercise: Scores in Math 1 of 10 students


79, 82, 78, 81, 85, 90, 87, 93, 88, 94
µ = 78 + 79 + 81 + 82 + 85 + 87 + 88 + 90 + 93 + 94 = 85.7
10
Md = 85 + 87 = 86 i.e. 5 scores are below 86, the others
2 above 86
Mode, none

Problem Exercise: Scores in Science 1 of n students


45, 50, 51, 50, 60, 65, 75, 80, 50, 85, 90
Calculate: 1. Mean 2. Median 3. Mode

2. Measures of Variation
The three measures of central location do not by themselves give an
adequate description of our data. We need to know how the observations
spread out from the mean.
2a. Range – difference between the largest and smallest measure. It is a
poor measure of variation if the size of the sample or population is
large. It considers only the extreme values and tells nothing
about the distribution of data in between.
2b. Variance – It is a measure of variation/deviation from the mean. An
observation greater than the mean will produce positive deviation,
whereas, an observation smaller than the mean
will produce negative deviation. It is the average of the squares
of the deviations of individual values from the mean.
2c. Standard Deviation – positive square root of variance. It is a special
form of average deviation from the mean, it is the positive square root of
arithmetic mean of the squared deviations from the mean. It is the
measure of heterogeneity and unevenness within the set of
observations.

µ
X −¿
Population Variance , 2
= ¿
σ
¿2
∑¿
¿
X
X −´¿
Sample Variance, S2 = ¿
¿2
∑¿
¿

Problem illustration:

Scores given by 6 judges to the performance of a gymnast


X = 7, 5, 9,7,8, and 6
Solution: µ = 7 + 5 + 9 + 7 + 8 + 6 = 7
6

7−7
5−7 9−7
8−7
σ
2
= ∑{( ¿ + ¿ +( 2 7+(6-7)2
¿ ¿ ¿ ¿
¿
¿ ¿2 +¿
6
= 1.67

σ =√ 1.67 = +/- 1.29

Sample Variance(Alternative formula)


2 2
∑ X −n X́
S
2
= sample mean X́ = 7+5+9+7+
n−1
8+6 =7
6
Interpretation: The average score given by six judges to the
performance of the gymnast is 7 but has the tendency to vary by
1.29 above and below the mean.

Problem Exercises:
1. Compute for the mean, median, variance, and standard deviation of the
following scores of 12 students in Math 1 E
20, 16, 19, 17, 18, 14, 15, 13, 11, 10, 12, 19

2. Two samples of bottled fruit juices are on display. One bottled by A and the
other by B. If you are to chose between A and B fruit juices, which one is
your choice?

Sample A: 0.95 li. 1.0 li. 0.93 li. 1.02 li. 1.10 li.
Sample B: 1.06 li. 1.01li. 0.88 li. 0.91 li. 1.14 li.

Comparing the three measures of Central Location and standard


deviation
1.Mean is the most commonly used measure of location. It is easy
to calculate and it employs all available information. It is an
interval statistic. It is adversely affected by extreme values.
2.Median is an ordinal statistic. It is the middle value when the
number of items is odd, average of two middle
Values if the items is even. It is not affected by extreme values.
3.Mode is the least used measure of location. For small set of data
its value is almost useless. Only in the case of large mass data it
has a significant meaning. It is a nominal statistic.

Problem Exercise: The problem below represents the nicotine content in mg of


9 cigarettes.
1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 1.20

Calculate the following: (a) mean (b) median (c) mode ( d) variance
(e) standard deviation

MEASURES OF RELATIVE DISPERSION

1. Coefficient of Variation, CV – relative measure of dispersion used to


compare the variability of the two sets of data.
CV = standard deviation
mean
Example: Comparing the variability of dwelling size for units A and B
A: X́ = 1000 sq. ft. S = 50 sq. ft.
B: X́ = 3000 sq. ft. S = 300 sq. ft.
A: CV = 50 sq. ft. = .05
1000 sq.ft.
B: CV = 300 sq. ft. = .10 Unit B is more variable than A
3000 sq.ft.
Problem Exercise:
Two samples of bottled fruit juices are on display. One bottled
by A and the other bottled by B. If you are the customer, which
sample will you consider buying and why?

Sample A: 0.95 li. 1.0 li. 0.93 li. 1.02 li. 1.10 li. ´
XA = 1.0 li.

Sample B: 1.06 li. 1.01li. 0.88 li. 0.91 li. 1.14 li. ´
XB = 1.0 li
Calculate the coefficient of variations.

A. Seat work exercise: Given scores of first year students in Eng. 1


Section A: 15 20 25 30 35 40 45 50 55 50
Section B :10 15 20 20 25 35 40 45 45 50
Calculate: Treating data as sample, calculate the following:
Mean, Median, Mode, Variance, Standard deviation, and Coef.
Of Variation (CV), which section is more variable

B. Problem Exercise: The following data give the ages of sample of Grade 3 and
Grade 4 pupils of A-1 Elem. School
Grade 3 7 8 8 9 7 8 9 8 7 8
Grade 4 9 10 8 9 10 10 9 7 8 10
Calculate the following: Mean, Median, Mode, Variance, and Coef. of
Variation (CV), Which group of pupils is more
Variable

2. Chebyshev’sTheorem - Useful rule in illustrating the relationship between


dispersion and standard deviation. Named after a Russian mathematician, P.L.
Chebyshev.
The two values most often used are the mean and standard deviation. If the
distribution has a small SD, we would expect most of the values to be grouped
closely around the mean. However, a large value of the SD indicates a greater
variability in which case we would expect the observations to be more spread
out from the mean.

The theorem enables us to calculate for any set of data (sample or


population) the minimum proportion of values that can be expected to lie
within a specified number of K standard deviations of the mean. The theorem
tells us that at least 75% of the values in a set of data can be expected to fall
within two std. deviations of the mean, at least 88.9% within three std.
deviations of the mean, and at least 96% of the measurements must lie within
five standard deviations.

Furthermore, it enables us to make statements about the proportion of


data that must be within the specified number of standard deviations, K SD from
the mean. “At least (1- 1/k2 ) of the data values must be within K std. deviations
of the mean (K>1).” K is greater than 1.
For K=2, the theorem states that at least,( 1 – 1/K 2) or ( 1 – ¼) = 75% or ¾
of the measurements must lie within 2 SD on both sides of the mean. That is, ¾
or more of the observations must lie in the interval µ+/- 2 SD.
Exercises: What percent of the observations lie within K = 3 ; K = 4

“Given a set of n observations X1 , X2, Xn on variable X, the probability is at


1
least ( 1 - 2 ) that X will take on a value within K std. deviations of the
K
mean of the set of observations where K is greater than 1.” The theorem is
applicable to both sample and population.

Example. Suppose a set of data has a mean of 150 and std. deviation of 25. By
1
Chebyshev’s theorem, the probability is at least 1 - = .75 that X will take
K2
on a 50, the probability is at least 75% that X will take on value between 150 –
50 and 150 + 50 or 100 - 200. Consequently, we can say that at least 75 % of the
values are found between 100 and 200. The specified number of standard
deviations(KSD) of the mean, X ± KSD.

Note: Chebyshev’s theorem enables us to make statement about the proportion


of data values that must be within specified number of Std. deviation (X
± KSD )

* Problem Exercise: If the IQ scores of a random sample of 1080 college


students have mean score of 120 and std. dev. of 8, determine the interval
containing 810 of the IQ scores.
1
Solution: 1 - = 810/1080 = 3/4 ; 1 - 3/4 = 1/K2; 1/4 = 1/K2; k = 2
K2
then,

X́ ± KS ; X́ ± 2(8) ; 120 ± 2(8); 120 -16 = 120 + 2(8) ; 104 - 136


thus, at least 3/4 of 1080 or 810 of the IQ scores are found at interval
104 – 136
Note: The variable X will take on a value within KS thus, X ± KS where
K refers to number of standard deviation and S refers to standard
Deviation. If K = 2, and S = 8, therefore, KS = 2(8) = 16

3. Z- Score measures the number of std. deviations the variable X is from the
mean. It is a measure of relative location of the observation in a data set.
Observations in two different data sets with the same Z-score can be said
to have the same relative location in terms of the same number of
standard deviations from the mean.

X−µ X− X́
Z= or Z= If Z –score is positive, the
σ S

observation X is above the mean; if Z –score is negative, the observation X


is below the mean. Z-score is used to compare two observations from
two different populations in order to determine their relative rank.
A method of ranking the two observations is to convert the individual
observations into standard deviation units known as Z-score or Z value.

Problem Illustration.
Let us assess the accomplishment of a student in Math 1 and Science 1.The
student’s score in Math 1 is 82 and 89 in Science 1. Can we conclude that the
student is better in Science 1 ?
Soln. We should consider how the student performed relative to other students
in each class. For Math 1 the mean score is 68 with SD of 8 while in Science 1
class, the mean score is 80 with SD of 6.

Math 1: µ = 68 σ =8 X = 82 Z= X – μ /¿ σ Z = 82-68 / 8 =
+1.75
Science 1 : μ=¿ 80 σ =¿ 6 X= 89 Z= X– μ/ σ Z = 89-80/ 6 =
+1.50

Since the Z-score of student in Math 1 is larger than his Z-score in Science 1, the
student’s relative performance in Math 1 is better than his performance in
Science 1.
Conclusion: The student is a better student in Math. 1

Problem Exercises:
1. Let us assess the encoding speed of an applicant whether he is suited for
Dean’s Office, Bus. Office, or Personnel Office.

Office Applicant’s Standard Std. Deviation


Score Speed
_________________________________________________________
Dean 141 sec 180 sec 30 sec
Business 7 min 10min 2min
Personnel 33min 26min 5min
_______________________________________________________
In what office is the applicant seem to be suited?
Since speed is of primary importance, we are looking for the z-score that
represents the greatest no. of SD’s to the left of the mean. (negative Z score)

2. Given two college students who took final exams. in Sociology from two
different professors their scores are as follows:
John’s Class Bill’s Class
Class mean score 75 88
John’s score 85 Bill’s score 92
Std. Dev. 6.5 4.0

By examining the student’s respective position in each class, who did


better relative to other members of the class?

3. Compare the variability of nicotine content of two samples of cigarettes


Type A : 1.2 mg; 1.3; 1.4; 1.5; 1.6; 1.7; 1.8; 1.9; 2.0; 2.1;2.5
Type B: 1.3mg; 1.5; 1.4; 1.6; 1.2; 1.7; 1.9; 2.1; 2.3; 2.5; 2.6

4. A certain type of flight of Bacolod City Airlines carried on daily average of


78 passengers with standard deviation of 13 passengers. The management
needed to know how disperse the number of passengers are, for such an
estimate is needed in order to make decisions which will maximize efficiency.
The management wanted to know what percent of the time and how many
passengers are within 2.5 S D and what the intervals are.

5. A study of nicotine contents of a certain brand of cigarettes shows on the


average one cigarette contains 1.52 mg of nicotine with standard deviation
of .08 mg. Between what values must the nicotine be for,
a) at least 24/25 of all cigarettes of this brand
b) at least 48/49 of all cigarettes of this brand

6. Suppose a set of data has a mean of 150 and std. dev. of 25. Between what
Values X will take on if K = 2; K = 3; K= 4

DESCRIPTIVE MEASURES FROM GROUPED DATA

Grouped data – data presented in a frequency distribution


Frequency distribution – tabular arrangement of data. It lists all data into classes or
class intervals with the corresponding number of values that belong to each class
Class limits – smallest and largest values that can fall in a given class interval

Class interval(c.i.) frequency(f)

10 – 14 2
15 - 19 4
20 - 24 9
25 - 29 6
30 – 34 5
I I
I I upper class limit
I lower class limit

Class boundary – exact class limits of an interval

Lower Class Ci Upper Class


Boundary(LCB) Class interval boundary(UCB) Class boundary Class Mark(CM)
9.5 10 – 14 14.5 9.5 – 14.5 12
14.5 15 - 19 19.5 14.5 – 19.5 17
19.5 20 - 24 24.5 19.5 - 24.5 22
24.5 25 - 29 29.5 24.5 - 29.5 27
29.5 30 - 34 34.5 29.5 - 34.5 32

Class Mark (CM) – midpoint of the class interval

How to construct frequency distribution table:

(1) Range = H – L
(2) Decide on the number of class intervals. Use Sturg’s rule unless the number of
C.i. is specified . No. of c.i. = 1 + 3.32 log10N

(3) Class width, W = Range/ no. of c.i. , should have the same number of signifi-
cant places as the data measures

(4) Construct the frequency table with the first class interval beginning with
lowest measure
(5) Tabulate

Problem Exercises
1. Construct the frequency distribution table for speeds in KPH of n cars plying
the downtown area

15 20 26 31 34 16 32 18
18 25 25 33 33 17 28 22
20 23 30 35 25 19 26 29
21 24 28 30 20 22 24 33
22 27 32 31 25 24 25 30

2. Construct the frequency distribution table for n lifetime in years of batteries

2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6


3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
3.3 3. 7 4.4 3.2 4.1 1.9 3.4 3.2
4.8 3.8 3.2 2.6 3.9 3.0 4.2 3.4
Calculate for: a. mean b. Median c. Mode d. Variance e. Std. Deviation

3.Minimum monthly salary in P103 of selected professionals in Bacolod City

15 20 26 31 34 16 32 18 19 26
18 25 25 33 33 17 28 22 20 33
20 23 30 35 25 19 26 26 24 31
21 24 28 30 20 22 24 33 30 28
22 27 32 31 25 24 25 30 22 23

Calculate for: (a) Mean (b) Median (c) Mode (d) Variance
(e) Standard Deviation

MEASURES OF CENTRAL LOCATIONS AND VARIATION


(For Grouped Data)

(1) Mean, µ = ∑fCM or X́ = ∑fCM


N n
(2) Md = LCB + [N/2 – LCF] w
fm

(3) Mo = LCB + [ d1 ] w
d1+d2

(4) Variance, S2 = ∑fCM2 - X́ 2


n
n–1

2
(5) Standard Deviation, S = √S
Illustration:
Minimum Monthly Salary in P103 of selected professionals in Metro
Bacolod

LCL UCL f CM LCB LCF fCM fCM 2


15 - 17 3 16 15 .5 3 48 768
18 - 20 9 19 17.5 12 171 3249
21 - 23 7 22 20.5 19 154 3388
24 - 26 13 25 23.5 32 325 8125
27 - 29 4 28 26.5 36 112 3136
30 - 32 8 31 29.5 44 248 7688
33 - 35 6 34 32.5 50 204 6936
50 1265 33290

Compute for: Mean, Median, Mode, Variance, Standard Deviation

Problem Exercise:
Frequency Distribution of Battery Lifetime
Class interval f CM LCB LCF fCM fCM 2
1.5 - 1.9 2
2.0 - 2.4 1
2.5 - 2.9 4
3.0 - 3.4 15
3.5 - 3.9 10
4.0 - 4.4 5
4.5 - 4.9 3

Complete the table and calculate for : Mean , Median, Mode, Variance, Std.
Deviation

PROBABILITY

Probability - numerical likelihood measured between 0 and 1.0 that an uncertain


event will occur

Approaches to probability:
1. Classical Approach -most often associated with gambling and games of
chance

PE = No. of favorable outcomes to the event


Total possible outcomes
Where: S= sample space or total possible outcomes

Example: Tossing of two coins once


Coin 1 and Coin 2
Possible Outcomes: H1H2 H1T2 T1H2 T1T2 S=4
Thus, PH2 = 1/4
PT1H1 = 2/4
PT2 = 1/4

2. The Relative Frequency Approach - uses past data that have been empirically
observed. It notes the frequency with which some events has occurred in the
past and estimates the probability of its reoccurrence on the basis of historic
data. Illustration: The price of oil which changes almost weekly
Example: PE = No. of times the event has occurred
Total observation
Declined = 20 days P declined = 20/100
= 20%
Increased = 50 days P increased = 50/100
=50%
Unchanged= 30 days P unchanged=30/100
=30%

3. Subjective Approach - probability estimate on the basis of one’s best


judgment. It is used where one wants to assign probability to an event that
has never occurred.

Example: The probability that a woman will be elevated to the status of Pope
. Are there data to rely on?
Subjective Probability (or Personal Probability)- By classical definitions, we can
find the objective probability values which indicates the relative rate of
occurrence of the event in the long process involved,
but these outcomes are based on actual observations and not on prior
outcomes. In many cases, either only a small number of past outcomes of the
events may be available or these may not be available. There may not be any
past outcomes to examine as in the case of marketing a new product. In this case
probability of success depends on PERSONAL JUDGMENT

A.N. Kolmogorov – Russian mathematician developed the modern concept of


probability which combines both the objective and subjective concepts.
The definitions is “with every A in a finite sample space S, we associate a
real number P(A) which is called the probability of event A if it satisfies the
following axioms.” [S is the set of all possible outcomes which is called sample
space or probability space].
1. Probability of an event always varies from 0 to 1.0 , PE ≥0≤ 1.0
2. P ∅=0
P(A or B) = PA + PB where A and B are mutually exclusive events

Theorems of Probability

Rules of Probability
1. Additive Rule – applies to union of events
1a. Mutually exclusive events where no two events can occur together.
Such events do not have any common outcomes. If two or more events
are mutually exclusive, then at most one of them will occur. Mutually
exclusive events are disjoint events.

Example: If A and B are mutually exclusive, then the occurrence of one event
excludes the occurrence of the other event. The probability of occurence of
either A or B is the sum of the individual probability of A or B
PA or B = PA + PB

1b. Non-mutually exclusive events. The addition theorem will no longer apply.
When events are not mutually exclusive, then the probability that at least
one of two events A and B will occur. Such events have
common elements.
PA or B = PA + PB – PAB
Example: What is the probability of drawing an Ace or
Diamond from a deck of playing cards?
N= 52 cards A = 4 D = 13 AD = 1
P A or D = PA + PD – PAD
4/52 + 13/52 – 1/52 = 16/52
2. Multiplication Rule
2a. Independent Events – the occurrence of one event does not affect the
probability of occurrence of the other event
PA and B = PA x PB
Example: Tossing of a coin twice
1st toss : PH1=1/2; PT1 = 1/2
2nd toss: PH2=1/2; PT2 = 1/2
PH1 and H2 = ½ x ½ = ¼
PT1 and T2 = ½ x ½ = ¼
PH1 and T2 = ½ x ½ = ¼
PT1 and H2 = ½ x ½ = ¼
____
PT = 4/4 = 1.0
2b. Dependent events or conditional probability – the occurrence of
one event is conditioned by the occurrence of the other event
PA and B = PA x PB/A ( Event B conditioned by A)
or PB x PA/B (Event B conditioned A )
Problem Exercise: There are 7 non-defective and 3 defective items in a box.
If two items are drawn one after the other, what
is the probability of
N1N2, N1D2 , D1N2, D1D2

Given: N = 7
D=3

3. Complementary Events. Two events are complementary if the failure of


one event to occur means the other event must occur.

PA + Pnot A = 1.0

Problem Exercises:
1. 1a. Rolling 2 dice, d1 and d2, what is the probability of getting a total of
7 or 11
1b.What is the probability of getting a total of 7 or die 1 is less than 4
when a pair of dice is tossed?
1c. What is that a throw of two dice less than 8
or equal to 8.

2.In a certain freeway, your chance of getting off at Exit 1 is 60%. If you
get off at Exit 1 your chance of getting lost is 30%. If you miss Exit 1 and
have to get off at Exit 2, your chance of getting lost is 70%. What is
the probability getting lost? not lost?
3.Two cards are drawn from a deck. Find the probability that an Ace and a
king will appear if the first card is (a) returned (b) not returned

4. A group of 200 appliance owners have the following distribution of


appliances:

(WM) Washing machine = 110 WM and D = 40


(D) Drier = 50 DW and D = 25
(DW) Dish washer = 60 WM and DW = 35
All three = 20
Find the probability that an owner selected at random has,
(a) WM and/or Drier and/or DW
(b) WM only
(c) DW only
(d) Drier only

5. Four students of the university, A, B, C, D are considered for a two-


member delegations to represent the school at an international student
leadership conference. What is the probability that,
(a) A will be selected
(b) B or C will be selected
(c) C and D will be selected
(d) A will not be selected
6. The probability that a car being filled with gasoline will also need an oil
change is 25%, the probability that it needs a new oil filter is 40%, and the
probability that both oil and oil filter need changing is 14%.
(a) If the oil has to be changed, what is the probability that a new filter is
needed?
(b) If a new filter is needed, what is the probability that the oil has to be
changed

7. In college graduating class of 200 students, 54 studied


mathematics, 69 studied history, and 35 studied both
mathematics and history. If one of these students is
selected at random, find the probability that
(a) The student takes mathematics or history;
(b) The student does not take either of these subjects;
(c) The student takes history but not mathematics.

8. One bag contains 4 white balls and 3 black balls, and a second bag
contains 3 white balls and 5 black balls. One ball is drawn from the first
bag and placed unseen in the second bag. What is the probability that
ball now drawn from the second bag is black?

9. A town has one fire engine and one ambulance available for
emergencies. The probability that the fire engine is available at 95%,
the ambulance is available when called is 90%. In the event of an injury
resulting from a burning building, (a) find the probability that both the
fire engine and the ambulance are available, (b) the fire engine
is available and the ambulance is not.

10. The probability that the lady of the house is at home when
the Avon representative calls is 0.6. Given that the lady of
the house is at home, the probability that she makes a purchase is
0.4. Find the probability that the lady of the house is home and
makes a purchase when the Avon representative calls?

11. Two cards are drawn from a deck. Find the probability of drawing an
Ace and a King if the first card drawn is (a) returned to the deck
(b) not returned to the deck

12. A number is selected at random from 1 to 50. What is the probability


of it being divisible by 3 or 7?
14. A pair of dice is thrown. If it is known that a first die shows a 4, what
is the probability (a) the other die shows a 5? (b) the total of both
dice is greater than 7 ?
15. The probability that a regularly scheduled flight of airplane departs
on time is 83%, the probability that it arrives on time is 92% and the
probability that it departs and arrives on time is 78%.
Find the probability that the airplane,
(a) arrives on time given it departed on time
(b) departed on time given it has arrived on time

PERMUTATION AND COMBINATION

Permutation is arrangement of items/objects in which


composition and order are important.

Example: How many arrangements of size 3 can we make from


items A, B,and C
ABC BCA 3P3 = 3! = 3.2.1 = 6
ACB CAB
BAC CBA
The six different permutations are obtained by simply
reordering the elements. Since with permutations order
makes a difference, different ordering yields a different
permutation.

Theorem 1. There are n items taken all at a time


nPn = n! 3P3 = 3! = 3.2.1 = 6

Theorem 2. There are n elements taken r at a time. The


number of elements is too large to permit a complete
listing the formula is, nPr = n!/(n-r)!

Example. Number of arrangements of 5 elements taken 3 at a


time
5P3 = 5!/(5-3)! = (5.4.3.2.1)/2.1 = 60

Theorem 3. There are n items taken all at a time but some


Items are repeated
nPn = n!/ n1! n2!n3!

Example. In how many ways may two girls and two boys be
seated in a row of four seats.

BGGB GBBG 4p4 = 4!/2!2! = 4.3.2.1/2.1.2.1


GBGB GGBB
BGBG BBGG
Combination is arrangement of items in which only composition
is important. Order makes no difference
Theorem. There are n elements taken r at a time
n!
nCr =
r ! ( n−r ) !
Example. There are 10 contestants, 3 of them are selected as
winners with no distinction as to the order in which
they finished the competition.
In how many ways can the winners be selected?
10C3 = 10!/3!(10-3)! =
10.9.8.7.6.5.4.3.2.1/3.2.1.7.6.5.4.3.2.1 = 120

There are 120 ways in which 3 contestants out of 10


can be selected as worthy of recognition. Prob. that
3 contestants can be selected with no distinction
as to the order Prob. = 3/120 = 2.5%

Exercise 1. Suppose 10 contestants enter the Bacolod


City contest. Three beauties will be selected
from 10 contestants and awarded P20K for 1 st,
P15K for 2nd, and P10K for 3rd place How many
arrangements can be made?
Solution. Since order is important, we are dealing with
permutations. How many permutations for 1st, 2nd, and 3rd
places?
10P3 = 10!/(10-3)! = 720

There are 720 ways in which 1st, 2nd, and 3rd places will
be selected by the judges from 10 contestants. Given
each contestant has an equal likelihood of being selected
for each three places, there is a probability of 1/720 that
any contestants will finish in a specific order. P= 1.33 x 10 -3

Exercise 2. In how many ways can 3 girls and 3 boys be seated


in a row of 6 chairs if girls are to be seated alternately with
boys?
Solution: The specification that the girl will sit alternately with
boys, puts the set of boys different from the set of girls, thus
3P3 = 3! = 6 represents the permutations of girls
Similarly, the permutations of boys will be
3P3 = 3! = 6
Since for every arrangement of girls can be combined with each
of the 6 combinations of boys, the number of permutations with girls sitting
on one side, say to the left of boys would be, (3P3)(3P3) = (6)(6) = 36
However, since girls may also sit to the right of boys, this would
involve the whole set of 36 permutations taken once again, this time in
reversed order.
Thus, the permutations would be 2[(3P3 )(3P3)] = 2x6x6 = 72
\

Exercises for Combination:


1. A box contains 5 red and 6 green balls. In how many ways can one pick
a group of 5 balls 3 of which are red
Solution. The combination of red balls are different from that of
the green balls.
But every combination of red balls could be combined with each
= combination of green balls. Thus,
5C3 = 5!/3!(5-3)! = 5.4.3.2.1/3.2.1.2.1 = 10 represent combination
of red balls possible
6C2 = 6!/2!(6-2)! = 6.5.4.3.2.1/2.1.4.3.2.1 = 15 represent combination
of green balls possible
Thus, total combinations of green balls with red balls would amount to,
(5C3)(6C2) = (10)(15) = 150

Problem Exercises:
1. How many groups of five students can be drawn in any order from 7
students.
2. The President of the A-1 Company must select four of her six vice
presidents to handle problems when they arise. How many different
arrangements of vice presidents can the president devise.
3. Five lifeguards are available for duty one Saturday afternoon. There are 3
lifeguard stations. In how many ways can three lifeguards be chosen and
ordered among the stations.
4. A company has hired 15 new employees, and must assign 6 to the day
shift, 5 to the graveyard shift, and 4 to the night shift. In how many ways
can the assignment be made?
5. How many different ways can 3 red, 4 yellow and 2 blue bulbs be arranged
in a string of Christmas tree lights with 9 sockets.
6. From 5 male and 4 female engineering students, find the number of
committees of 4 that can be formed with three males and one female?
7. Five manufacturers produce a certain electronic device whose quality
varies from manufacturer to manufacturer. If you are to select three
manufacturers at random, what is the chance that the selection would
contain exactly two of the best three.

PROBABILITY DISTRIBUTION OF RANDOM VARIABLES

Probability Distribution- display of all possible outcomes of a random


event, i.e. list of all possible outcomes with a corresponding probability
associated with each value of a random variable. The sum of all of the
corresponding probability is equal to 1.0
Random variable – variable whose value is determined by the outcome of a
random experiment
Kinds of random variable:
1. Discrete random variable – variable which can be counted. It takes on
integer values such as 0,1,2….10
Example: Number of defective resistors
2. Continuous random variable – variable which can be measured. It takes
on all values within a certain interval

Types of Probability Distribution:

1. Binomial distribution – refers to the probability that certain number of


events will occur in a given number of trials.
Example: Flipping of 5 coins ten times. What is the probability of 5
heads in 10 trials

2. Poisson distribution – Developed by French mathematician Simeon


Poisson. It measures the probability of a random event over some
interval of time or space. It is often used to describe number of arrivals
of customers per hour ,the number of industrial accidents per month;
the number of defective electrical connections per mile of wiring in a
city’s power system, number of machine breakdowns and are awaiting
repair.
For each random variable (customers, defects, accidents) is
measured per unit of time or space (distance).

Binomial Distribution – based on the Bernoulli process, named after Jacob


Bernoulli, Swiss mathematician
Conditions/Terms of Binomial Distribution
1. There must be only two possible outcomes, Success or Failure.
However, bad or good are not attached to these outcomes.
2. Each trial is independent
3. The probability of success remains constant for all trials
Note. Each trial in a binomial distribution results in one of only two
mutually exclusive outcomes, one of which is identified as success and
other as failure. The probability of each remains constant from one
trial to the next.

Approaches to Binomial Distribution

1. Binomial Expansion- used when probability of success is equal to


probability of failure
Example. (a + b)2 = a2+2ab+b2
(a + b)3 =
(a + b)5 =
n!
2. Bernoulli Equation: Px = px qn-x
x ! ( n−x ) !
Where: p = probability of success in any given trial
q = probability of failure in any given trial, 1 – p
n = number of trials
! = factorial sign
x = discrete random variable

Example: A credit manager of American Express card has found that


10% of their card users do not pay the full amount of
indebtedness during any month. She wants to determine the
probability that of the 15 accounts randomly selected, 5 of them
are not paid
Given: p = 10% q = 1 - .10 = .90 n = 15 x = 5

15!
P(x=5) = (.10)5(.90)15-5 = .0105 or 1.05%
5 ! ( 15−5 ) !

Problem Exercises:
1. A salesman for a local distributor of housewares makes 10 house
calls a day. The probability that he makes a sale in any given day is
20%. What is the probability that today he will,
a. not make any sale
b. at least one sale
c. 2 sales
2. The Bus. office revealed that 10% of the college students cannot
settle their accounts at the end of the semester. What is the
probability that of the 15 students selected at random,
a. 5 cannot settle their accounts
b. none cannot settle their account
c. at least one cannot settle their accounts
d. at most 2 cannot settle their account
3. A company makes color tv’s 5% of which are defective. Ten color
tv’s are shipped to the dealer. If each tv set assembled are
considered an independent entity, what is the probability that the
shipment of tv’s contain,
a. no defective tv
b. at least one tv is defective
c. less than 2 tv’s are defective
d. exactly two tv’s are defective
4. Record shows that A-1 Computer is up and running 80% of the
time. Bert, the resident computer jock of A-1 examines the
computer nine times. What is the probability that it is functional
3 times? All will be functional in the nine times Bert inspects the
computer?
5. Five persons are required to operate a chemical process, the
process cannot be started until all 5 work stations are manned.
Employee’s record indicates that there is a 40% chance of any one
coming late and they all come independently. The management is
interested of knowing the probability of anyone coming late so
that back-up personnel can be arranged. What is the probability
of, (a) none coming late (b)1 or 2 or 3 coming late

The Poisson Distribution


Poisson distribution measures the number of occurrence of event
per unit of time or space interval

Conditions of Poisson distribution


1. The number of successes in two disjoint time intervals or region of
space are independent.
2. The probability of success for a small time interval or region of
space is proportional to the length of time interval or region of
space.
3. The probability of two or more successes in a small time interval or
region of space is negligible.
Poisson Equation:

x
λ
Px = λ Where: X = Poisson random variable with
e X!
Possible values of 0, 1, 2……10
λ = mean number of successes in a
given
time interval or region of space
e = base of nat log = 2.7183

Example: If the number of telephone calls a secretary receives from 9:00 to


9:10 a m averaged 4, what is the probability that on the same time
interval on another day, the secretary will receive,
a. no call
b. at least one call
c. at most two calls
Note: Poisson and binomial distribution have approximately the same shape
when N is large and the probability is close to Zero. If these conditions hold the
Poisson distribution with λ=¿ np can be used to approximate binomial
probabilities.
Example. Suppose that on the average 1 person in every 1000 is alcoholic, find
the probability that a random sample of 8000 people will yield fewer than 7
alcoholics.
Soln. This is essentially, a binomial distribution where n = 8000 and p=1/1000
p = .001. Since p is very close to zero and n is quite large, n=8000 we shall
approximate λ = 8000 x .001 = 8, thus P7= 87/2.7183(7! ) =

PX< 7 = Po + P1 + P2 + P3 + P4 + P5 + P6

Problem Exercises:

1. On the average, in a certain blind intersection, 3 car accidents happen in a


month. What is the probability that in any given month at this intersection,
a. exactly 5 car accidents will happen
b. at least 2 car accidents will happen
c. less than 3 car accidents will happen

2. A certain area in Eastern Visayas is hit on the average by 6 typhoons a year.


Find the probability that this year, this area will be hit by,
a. fewer than 4 typhoons
b. anywhere from 5 to 7 typhoons
3. Cars arrive at a car wash randomly and independently. On the mean, 4 cars
arrive every 15 min. What is the probability that 5 or more cars will
arrive during 15 min. operation.
4.Customer arrivals at a bank are random and independent. If the mean arrival
rate of customers is three per minute, what is the probability of,
a. exactly four arrivals per minute,
b. at most two arrivals per min,
c. 4 or 5 or 6 arrivals per min.
3. The average number of oil tankers arriving each day at a certain port city is
known to be 10.
The facilities at the port can handle at most 15 tankers per day. What is the
probability that the port is unable to handle all the tankers that arrive (a) on
a given day (b) on one of the next 3 days
4. Suppose that on the average 1 person in 1000 makes a numerical error in
preparing his income tax return. If 10,000 forms are selected at random and
examined, find the probability that 6,7,or 8 of the forms will be in error.
5. The probability that a person dies from a certain respiratory infection is .002.
Find the probability that fewer than 5 of the next 2000 so infected will die.

MARGINAL AND CONDITIONAL PROBABILITIES


It is the extension of conditional probabilities presented in a contingency
table.

Example. From a random sample of voters find the probability of their


opinions on RH Bill

Voter’s Income Status

Opinion Low Middle High Total

Favor 58 63 27 148

Against 42 77 33 152

Total 100 140 60 300

(1).Prob. of a single event = Subtotal of the desired event


Grand Total

* Prob. the voter favors RH Bill = 148/300 (2).Joint prob. of two independent events =
Observed frequency where the two events meet
Grand Total
* Prob. the voter is against RH Bill and belongs to middle income group=77/300

(3).Conditional prob. of two dependent events= observed frequency where 2 events meet
Subtotal of conditioning event
* Prob. the voter belongs to low income given against RH Bill = 42/152

(4).Prob. of non-mutually exclusive events =Add the prob. of each event and subtract the
joint probability
* Prob. the voter belongs to middle income or against RH Bill= 140/300 +152/300 – 77/300

Exercises: 1. Prob. the voter belongs to high income and favors RH Bill
2.Prob. the voter belongs to low income given favors RH Bill
3. Prob. the voter belongs to middle income or favors RH Bill
4. Of the 1000 employees of A-1, Inc. 300 are males and the rest females. A total of 130
Males and 200 females are in favor of the proposed salary scale. What is the probability
that an employee selected at random,
4a. is a male and favors the proposed salary scale
4b. is a female given she is against the proposed salary scale
4c. is a male or against the proposal
4d. both favors the proposal

MULTINOMIAL DISTRIBUTION
The binomial becomes a multinomial if we let each trial have more than two
possible outcomes. Theorem: If a given trial can result in K possible outcomes of
events, E1, E2 Ek with probabilities P1,P2,Pk,, then the probability distribution of random
variables X1 X2 Xk representing the number of occurrences for events E1 E2 Ek in n
independent trials, is

n!
P(X1,X2,Xk; P1,P2,Pk, n) = P X1P X2P Xk
X 1 ! X 2 ! Xk ! 1 2 k

Illustration: If pair of dice is tossed 6 times, what is the probability of obtaining


a total of 7 or 11 twice, a matching pair once, and any other combination
3 times?
Soln. Prob. of 7 or 11 = 6,1;5,2;4,3;3,4;2,5;1,6 = 6/36 or 1/6
Prob. of 11 = 6,5; 5,6 = 2/36 or 1/18
Matching pairs = 1,1;2,2;3,3;4,4;5,5;6,6 = 6/36 or 1/6
Other combinations = 22/36 (6+2+6 ) =14,thus 36 – 14=22or 22 /36 or 11/18
Events Prob. X(no. of occurrence)
Prob. of 7 or 11 = 6/36 + 2/36 = 8/36 or 2/9 P 1=2/9 X=2
Matching pairs P2=1/6 X=1
Neither pair or total 7 or 11 P3 = 11/18 X=3
6!
P(2; 1; 3; 2/9;1/6;11/18; 6) = (2/9)2(1/6)1(11/18)3 = 0.1127 or 11.27%
2 ! 1 ! 3!

Prob. Exercise 1. The probabilities are 0.40,0.20,0.30 and 0.10, respectively that
a delegate to a certain convention arrived by air, bus, car, or train. What is
the probability that among the 10 delegates selected at random at this
convention, 3 arrived by air, 1 by bus, 4 by car, and 2 by train.

2..Find the probability of obtaining 2 ones, 1 two, 1 three, 2 fours,


3 fives, and 1 six in 10 rolls of a balanced die?

3.The probabilities that the cars for rent at a local airport are available are:
20% Honda, 25% Toyota, 15% Nissan, 18% Hundai, and 22% Ford. The
proprietor of a car rental agency randomly selects
10 cars to chauffer delegates from the airport to the convention center.
Find the probability that
2 Ford, 3 Nissan, 2 Honda, 1 Hundai and 2 Toyota cars are available.

HYPERGEOMETRIC DISTRIBUTION

Probability of selecting X “successes” from K items and n-x from N-K items
labeled as “failure”, when a random sample of size n is selected from a finite
population of size N.
Properties of hypergeometric experiment
1. A random sample of size n is selected from a population of N items
2. K of the N items may be classified as successes and N-K classified as
failures
3. X = hypergeometric random variable

Theorem: If a population of size N contains K items labeled as “success” and


N-K items labeled as “failure”, then the probability distribution of the
hypergeometric random variable X, the number of successes in a
random sample of size n, is

kCx N−kCn−x
P(x; N; n;k) = .
NCn

IIlustration: If 5 cards are dealt from a standard deck of 52 playing


cards, what is the probability that 3 will be hearts?

Exercise no. 1 A committee of size 5 is to be selected at random from


3 women and 5 men. Find the probability distribution for,
a. no woman in the committee
b. all men in the committee
The mean and variance of an hypergeometric random variable are
similar to those of a binomial random variable with a correction for
the finite population

μ = n(k/N) mean

2
σ =n(k/N)(N-k/N)(N-n/N-1)

According to Mendenhall and Deck, Olsen in Introduction to Statistics and Data


Analysis, the mean and variance of the hypergeometric distribution p( x;N;n;K) are
µ= nk/N σ 2 = N-n/N- 1. N. k/N(1 – k/N)

It is the same as the mean and standard deviation of a binomial distribution of


a random variable, so the mean values of 0, 1, 2…..n is,
µx =∑X.PX = 0.po + 1.p1 + 2.p2……n.pn and variance of X is

σ 2=∑(x-µ)2. P(x) = (0- µ)2.p(0) + (1-µ)2.p(1) + (1-µ)2.p(2)+…..(n - µ)2.p(n)


If n is small relative to N, the probability for each will change only slightly.
Hence, we essentially have a binomial experiment and can approximate
the hypergeometric distribution by using the binomial distribution with p=k/N.
The mean and variance can also be approximated by the the formula,
µ= nK/N
σ 2= npq = N-n/N-1 . n.k/N (1-k/N)
N – n/N – 1 is a correction factor, negligible when n is small relative to N

Example. A case of wine has 12 bottles, 3 of which contain spoiled wine.


A sample of 4 bottles is randomly selected from a case.
1. Find the probability distribution of X = 0,1,2 bottles of
spoiled wine
2. What are the mean and variance of X

Multivariate Hypergeometric Distribution

Theorem: “If a population of size N can be partitioned into K cells


A1,A2, Ak with a1; a2; ak elements, respectively, then the
probability distribution of the random variables X1;X2;Xk, representing
the number of elements selected from A1;A2;Ak in a random sample
of size n is,

( a 1Cx 1 ) ( a 2Cx 2 ) (a 3 Cx 3)
P(x1; x2;xk; a1;a2;ak; N, n) =
NCn

Example: A gardener wishes to landscape a piece of property by


planting flowers across the front and rear of the house. From a
a box containing 3 tulips, 4 daffodils, and 3 hyacinths, he selects
5 at random to be planted at the front of the house and the remaining
5 to be planted at the rear of the house. What is the probability that
1 tulip, 2 daffodils, and 2 hyacinths will bloom at the front of the house?

Prob. Exercise:
A homeowner wishes to plant different kinds of vegetables in his vegetable
garden. He selects 5 to be planted in his garden. The box contains 5 lettuce
and 4 tomatoe seedlings. What is the probability that he will plant 2 lettuce
and 3 tomatoe seedlings.

THE NORMAL DISTRIBUTION

The normal probability distribution also known as Gaussian distribution


after the mathematician and astronomer, Karl Gauss, is a continuous distri-
bution which is regarded as the most significant probability distribution
in the entire theory of statistics.

Properties of the normal curve/distribution


1. The mean, median, and mode have the same value and plotted
on the central point along the horizontal line
2. The curve is symmetrical about the vertical line which contains
the mean
3. The curve is asymptotic relative to the horizontal line
4. The total area under the curve is 1.0 or 100% and it is subdivided
into at least three standard scores (3), each to the left and to the
right of the vertical axis.

The Empirical Rule – regardless of the value of the mean and standard deviation,

68 % of all observations lie within one SD of the mean


95.5% of all the observations lie within two SD’s of the mean
99.7% of all observations lie within three SD’s of the mean

The Normal Deviate


The Standardized normal distribution
There can exist an infinite number of possible normal distributions, each with its
own mean and standard deviation. It is necessary to convert all these normal distributions to
the standard normal distribution with the Z – formula.

X−μ X− X́
Z= or Z=
σ s

Where: Z is the normal deviate and X is some specified value for the random variable.
Z measures the number of standard deviations an observation is from the mean.
After the conversion process, the mean of the distribution is 0 and SD is 1
Illustration: Telecom, a telephone answering service for business executive in Metro Bacolod
has found that the average telephone message is 150 sec. with SD of 15 sec. The
length of message is normally distributed a particular phone message took 180
seconds. How many seconds is it longer than the average?

X−μ 180−150
Solution. Z = Z = = 2 SD’s or 30 seconds above the mean
σ 15

What is the probability that the single message takes between 150 sec. and 180 sec.
180−150
Solution: Z = = 2 From normal curve table, Z = +/- 2.0 is equivalent to .4772
15
Telecom concludes that there is 47.72% chance that any single telephone message will
last 150 sec to 10 sec.

Exercises: Find the area under the normal curve. Use the normal curve table

1. Between Z = -1.75 and Z = + 2.85


2. Below Z = -2.75
3. Beyond Z = + 2.20
4. Between Z = - 2.25 and Z = - 2.90

Problem Exercises:
1. A random sample of 1000 construction workers gave their average daily wage at
P420 with SD of P35. Assuming that daily wages to be normally distributed,
a. what is the probability that a worker selected at random earns between
P490 and P380 a day?
b. how many workers earn less than P450 a day?
c. if workers who earn P480 and above are asked to contribute P70 for a sick
co-worker, how much is the expected contributions?

2. A study of prevailing market prices for one day shows that the average price of rice
per kilo is P40.00 with SD of P1.50. Assuming that prices are normally distributed,
a. What percentage of rice sells at higher than P43.00 per kilo?
b. If in a particular market, 1000 sacks of rice were sold, how many kilos were sold
at less than P42.00 per kilo? (1 sack = 50 kilos)
c. What average price per kilo should the government try to maintain so that
80% of rice sells at not more than P42.50 .
3. The average life of a certain type of motor is 10 years. The manufacturer replaces
free all motors that fail while under guarantee. If he is willing to replace only 3%
of the motors that fail, how long a guarantee should he offer? SD is 2 years.
4. Assume that heights of women in a population follow a normal curve with mean of
64.3 inches and SD of 2.6 inches.
a. What proportion of women stand between 60 inches and 66 inches?
b .A certain woman has a height of 0.5 SD above the mean., What proportion
of women are taller than she ?
5. A distribution of test scores in Statistics follow a normal distribution with mean of
80 and std. deviation of 12. There are 120 students who took the test.
a. How many scores do you expect to find above 100
b. How many scores do you expect to find between 90 and 110
6. A soft drink machine is regulated so that it discharges an average of 200 ml
per cup. If the amount of drink is normally distributed with standard deviation
of 15 ml,
a. What fraction of the cup will contain more than 225 ml?
b. How many cups will likely overflow if 235 ml cups are used in the next
1000 drinks?
c. Below what value do we get the smallest 20 % of the drinks?

Correlation and Regression Analysis

Definition: Correlation is the measure of relationship between or among variables


Coefficient of correlation, r is the index of relationship between variables.
It measures the strength of relationship or association between variables.
These variables include independent variables and dependent variable.

The values of the coefficient of correlation are between -1.0 and +1.0
If r is +1.0, it indicates that the two variables are perfectly related in a
a positive sense which means, if X increases, Y also increases; if X decreases,
Y do likewise. If r is negative, it indicates that X and Y are not linearly related,
meaning, if X increases, Y decreases.

Table of Coefficient of Correlation

Coef. Of Correlation, r Interpretation

r = 0.00 No correlation
r from +/- .01 to +/- .19 Negligible correlation
r from +/- .20 to +/- .39 Low Correlation
r from +/- .40 to +/- .59 Moderate Correlation
r from +/- .60 to +/- .79 Moderately High Correlation
r from +/- .80 to +/- 1.00 High Correlation

Techniques of Correlation

1. Spearman Rank – Order -Correlation

6 ∑ d2
r = 1- Where: 6 = constant
N ( N 2−1)
N = no. of pairs
d = difference between ranks
Example: Relationship between Height and Weight of persons

2. Pearson Product – Moment (PPM)


2
N (∑ X )
¿
∑X
∑Y
¿
Y
r= ∑¿
N (¿ 2¿)−¿
[¿−( ¿¿ 2 ) ]¿
√¿
N ∑ XY −(∑ X)(∑Y )
¿

3. Deviation from mean

∑ dy2
r = √ (∑dx )(¿)
2
Where: dx = X - X́ dy = Y - Ý
∑(dx . dy)
¿

Illustration:
(1) The relationship between AGE of machines and its REPAIR costs

Age in Years Repair costs in P 103


2 1.0
4 2.2
5 2.5
7 3.0
8 4.5
10 5.0

Find the Coef. of Correlation and interpret

2. Relationship between pressure and volume of a confined gas

V (Cm3) 50 60 70 90 100 110


3
P(Kg/Cm ) 64.7 51.3 40.5 25.9 15.5 10.2

Calculate the coef. of correlation, r and interpret


3. Ten candidates for Miss Philippines were ranked by Judge X and Judge
according to beauty and talent. Find If the choice of Judge X is consistent
with the choice of Judge Y

Candidates: A B C D E F G H I J
Judge X : 2 3 1 6 4 5 10 7 8 9
Judge Y : 2 1 3 5 6 4 8 10 9 7

3. Relationship between Height and Weight of seven students selected at random


Height in ft. and inches Weight in lbs.
5ft and 8 in. 150 lbs.
4ft and 10 in. 110 lbs.
5ft and 9 in. 140 lbs.
5ft and 5 in. 130 lbs.
5ft and 4 in. 120 lbs.
5ft and 2 in 90 lbs.
5ft and 0 in. 100 lbs.

Limitations of Rank-Order Correlation


It takes account only the rank position of the items in the series and make no
allowance for gaps between adjacent measures.

Advantages:
1. Rank Order – provides a convenient way of estimating coef. of correlation if
N is small
2. Pearson Product-Moment(PPM) – takes into account the absolute size of
the measures and not merely their rank position
Coefficient of Determination, r2 expresses the proportion of total variation in Y that
can be accounted for or explained by the independent variable X.
Thus, r = .60, r2 = .36 meaning, 36 % of the variation in Y is accounted for by X

Regression Analysis

Definition. Regression is a quantitative expression of the basic nature of the relationship


between X and Y (Independent and Dependent variable). It determines the
change in Y given a change in X.

Correlation measures the strength of relationship between X and Y. If X and Y are


related in a linear manner then as X changes by a constant amount, Y also changes
by a constant amount.

Linear Regression Equation

Y^ = a + bX Where: Y^ = estimated value of Y


a = y- intercept, that is value of Y when X = 0
b = slope i.e. change in Y for every unit change in X
X = independent variable

X
∑¿
¿
where: b= N ∑ X 2−¿ a= Ý - b X́
N ∑ XY −(∑ X)(∑Y )
¿

thus, Y^ = a + bX ± SEest

SEest is the standard error of estimate. It measures the disperse on about an average
line called regression line.

SEest =
√ ∑ Y 2−a (∑ Y )−b (∑ XY )
n−2

Coefficient of determination, r2 expresses the proportion of total variation in Y that can be


accounted for or explained by the independent variable X. Thus, r = .6, r 2 = .36 meaning, 36%
of the variation in Y is accounted for by X
Exercises:
1. An assembly plant wanted to study the relationship between the age of machine and
its repair cost. The following data represent the Age in years and Repair costs of a
random sample of 8 machines.

Age in Years Repair costs(P103)

5 3.5
6 5.0
7 5.0
9 5.2
12 6.0
13 6.0
15 6.2
16 7.1
a. Determine the coefficient of correlation, r
b. Estimate the repair cost of a ten- year old machine

2. The data below represent the electrical energy consumption in KWH and the
amount due over a period of 6 months.

Energy Consumption Amount Due


(KWH) (P103)
83 0.853
160 1 .425
190 1.732
153 1.505
147 1.547
170 1.655
a. Calculate the coefficient of correlation ,r
b. Estimate the amount due for 200 KWH consumption

3. Data below represent the supply of product A and its price per unit

Price per unit Supply (103) units


P 25 100
20 120
30 80
25 110
35 90
30 100
40 75

3a. Find the coef. of correlation by PPM


3b. Estimate the price per unit when supply is 150,000 units

4. The marketing research dept. of A-1, Inc. wanted to study the relationship
between the Advertising expenditures and Sales volume of a certain product.
Estimate the sales volume for an advertising expense of P50,000

Ad Expense(P103) Sales Volume(P103)


5 40
7 45
10 60
12 65
15 75
20 80
25 95

5. Relationship between the teaching performance and tenure in years of ten teachers

Teaching Tenure in
Performance, % Years

84 4
86 6
90 14
87 8
92 15
94 12
95 16
80 5
85 7
88 10

5a. Calculate for the coef. of correlation by PPM


5b . Estimate the performance rating of a teacher who has been in the
profession for 18 years

Multiple Regression Analysis

Multiple Regression Equation, Y^ = a + b1X1 + b2X2

Sub- equations: (1) ∑Y = na + b1∑X1 + b2∑X2


(2) ∑X1Y = a∑X1 + b1∑X12 + b2∑X1X2
(3) ∑X2Y = a∑X2 + b1∑X1X2 + b2∑X22

Problem illustration:
Mr. de los Santos has been concerned for sometimes with the overhead
costs in his furniture shop. For the last 7 months he has kept a record not only of the
direct labor hours in the shop but also the total costs of lumber used in the operation.
The data are found in the following table:

Month Overhead Hrs./mo. Lumber


Costs(P103) (103) Used(103)
Jan 3.1 0.39 2.4
Feb 2.6 0.36 2.6
Mar 2.9 0.38 2.3
Apr 2.7 0.39 1.9
May 2.8 0.37 1.9
June 3.0 0.39 2.1
July 3.2 0.38 2.4

1. What is the dependent variable


2. What are the independent variables
3. Find the values of a, b1, and b2
4. Find the overhead costs for labor equals to .40 hrs/month
lumber used of P2500

TESTS OF HYPOTHESIS

Inferential or sampling statistics is useful in generalizing populations from a small sample.


In many instances a researcher can only rely on the information provided by the sample.

Hypothesis is an educated guess or a tentative answer to a question. It is a statement about


an expected relationship between two or more variables that can be empirically tested.

Kinds of Hypothesis

1. Ho: Null hypothesis or statistical hypothesis is a negative statement which indicates


absence of relationship/correlation between two variables; an absence of a significant
difference between proportions of two groups; absence of significant difference
between means of two groups

Example: There is no significant relationship between A and B

2. Ha: Alternative hypothesis or research hypothesis is a positive form of the null


hypothesis. It may state the presence of a significant relationship between the
independent and dependent variables, or the presence of a significant difference
between two means or two proportions

Example: There is a significant relationship between A and B

3. Directional hypothesis states whether the relationship between two variables is


direct or positive or inverse or negative. A positive or direct relationship is present
when the value of one variable increases with the increase in the value of another.
The relationship is negative when the value of one variable increases as the value
of another decreases.

Example: The taller, the heavier X > Y ; A< B

4. Non directional hypothesis does not specify the direction of relationship between
variables. It merely states the presence or absence of a relationship between two
variables or that one variable influences the other variable.
Example: There is significant difference between the performance rating
of students who attended the review class and those who did not

A ≠ B ; X≠ Y

Level of Significance, α
The significance level of a test is the maximum value of the probability of
rejecting the null hypothesis when in fact it is true.
5% level of significance implies that you are 95% confident that you have
made the right decision of accepting or rejecting the hypothesis.

A 1% significance level, α = .01 means that you could be wrong with a


probability of 1 %. It implies that you are 99% confident that you have made
the right decision.

Rejection of the null hypothesis (Ho) implies acceptance of the


alternative hypothesis (Ha)
Acceptance of the null hypothesis (Ho) implies rejection of the alternative
hypothesis (Ha).

Ho: Null hypothesis – statement of non-significance of the difference between X and Y


Ho: X = Y

Ha: Alternative hypothesis – statement of the significant difference between X and Y


Ha: X ≠Y (non−directional alternative hypothesis) This is a two-tailed test
Ha: X ¿ Y ∨ X< Y ( directionalalternative hypothesis ) This is a one tailed test

Example: Ho: There is no significant difference between X and Y (X = Y)

Ha: There is a significant difference between X and Y (X ≠Y ¿


Ha: X is significantly greater than Y (X > Y) or (Y < X)

Tests of Hypothesis

1. Z-test is used when the population standard deviation is known. Sample size is 30 or
more (n is 30 or more ,Walpole 3rd ed)
2. t-test is used when sample standard deviation is known. Sample size is less than 30

Uses of Z – test:

1. sample mean is compared with population mean

X́−µ
Z= √n µ = population mean X́ = sample mean
σ
n = sample size σ = Population standard deviation
Note: For Z – test, use table of critical values of Z based on the area under the
normal curve.

2. Comparing two sample means



1 1
σ n 1 + n2
Z= ¿ n1 = first sample size n2 = second sample size
´
X 1− X´ 2
¿

3. Comparing two sample proportions P1 = proportion of first sample


Q1 = 1 –P1


P 1Q 1 P 2Q 2
+
n1 n2
Z= ¿ P2 = proportion of 2nd
P1−P 2
¿
sample ,Q2=1-P2

Accept Ho if the absolute value of the computed Z is less than the table value,
reject Ha
Reject Ho if the absolute value of the computed Z is equal or greater than the table
value. Accept Ha

Uses of t-test:

1. Test of population mean

X́−µ
t= √ n−1 S = sample standard deviation
s

2. Test of two sample means (n1 ≠ n2)


1 1
2( + )
n1 n 2

s 12 +¿n 2 s 2
¿ n1
t = n 1+n 2−2
´X́ 1− X́ 2
¿

3. Test of two samples, n1 = n2

X 1−X 2
t=
√ ∑ d2
n(n−1)

Problem Exercises:

1. Two types of wires are being compared for strength. Fifty pcs. of each type of wires are
tested under similar conditions. Type A has an average tensile strength of 78.3 Nt, while
type B has tensile strength of 87.2 Nt. The combined standard deviation of wire is 5.6
Nt. Which type of wire is stronger. Test at 1% alpha.
2. A sample survey of TV program in Metro Bacolod shows that 80 of 200 men prefer
watching NBA. From another group of sample, 75 of 250 prefer watching PBA. What is
the preference of men? Test at 5% alpha.

3. A cigarette manufacturer claims that his cigarettes has an average nicotine content of
1.83 mg. and standard deviation of .11 mg. If a random sample of 50 cigarettes of this
type has an average nicotine content of 1.90 mg , will you agree with the claim of the
manufacturer? Test at 1% alpha.

4. A researcher wishes to find out whether or not there is a significant difference between
the weekly pay of night and day shiftees of a certain company. By random sampling, she
selected 25 day and 27 night shiftees and computed their mean weekly pay and
standard deviations. The day shiftees has a mean weekly pay of P1575 with standard
deviation of P55. The night shiftees has a mean weekly pay of P1850 and standard
deviation of P65 . Do the night shiftees earned significantly higher than the day
shiftees? Test at 5% alpha.

5. To determine whether membership in campus club is beneficial or detrimental to


one’s GPA, the following GPA’s were collected over a period of 5 years. Test at 1%
alpha.

Year X1(Member) X2(Non-member) d1(X1-mean X1)2 d2(X2-mean X2)

1 2.0 1.9
2 2.0 1.9
3 2.3 2.0
4 2.1 2.1
5 2.4 2.3

6. Data from the subdivision survey shows that the average monthly electrical
consumption of residential homes is 150 KWH with standard deviatioin of 18 KWH. A
sample of 70 residential homes were selected randomly and were found to consume on
the average 190 KWH . Are the 70 residences consuming significantly more than the
rest? Test at 2.5% alpha.

7. Alpha Company manufactures steel cable with an average tensile strength 150 Nt.
The laboratory tested 30 pcs. and found to have average tensile strength of 145 Nt.
and standard deviation of 6.5 Nt. Is the result of the laboratory in accordance with

8. Two types of thread are compared for strength. Twenty five pieces of each type of
thread are tested under similar conditions. Type X has an average strength of 78.3
Kg, and standard deviation of 5.6 Kg. Type Y has an average strength of 87.2 Kg
and standard deviation of 6.2 Kg. Which type of thread is stronger? Test at 1%
level of significance.

9. A manufacturer of light bulbs claims that his light bulbs burn on the average
500 hrs. To maintain this average, he tests 25 bulbs each month. If the computed
t value falls between -to.o5 and t.o5, he is satisfied with his claim. What conclusion
should he draw from a sample that has a mean 518 hours and standard deviation
40 hrs. Assume the distribution of burning times to be approximately normal.

10. The television picture tubes of manufacturer A have a mean lifetime of 6.5 years and
standard deviation 0.9 year, while those of manufacturer B have lifetime of 6.0 years
and standard deviation of 0.8 year. What is the probability that a random sample of 36
tubes from manufacturer A will have mean lifetime that is at least one year more than
the mean lifetime of a sample of 49 tubes from manufacturer B.

CHI SQUARE ANALYSI ( χ 2)

Chi Square is used as a test of significance when data are expressed in frequencies
or data are in terms of percentages or proportions and that can be reduced to frequency.
The applications of chi square are with discrete data; however, any continuous data may
be reduced to categories and the data so tabulated that chi square may be applied.
Example: Scores on a test of mental ability and dexterity test could be tabulated into
a contingency table.
Dexterity Test Score

Mental Ability Scores 12 - 20 21 – 29 30 - 38


140 and up none 1 3
120 – 139 2 5 2
100 - 119 5 3 1
80 - 99 3

To use the Chi square statistic, the data must be independent, i.e., no response
is related to any other responses. Also the categories into which data are placed must be
mutually exclusive, i.e. the frequency must be placed in one and only one category. And
finally, all data must be used. All the observed data must be used in a chi square problem.

Formula for Chi Square:


where: fo = observed/ actual cell frequency
2 ( fo−fe)2
χ = ∑ fe = expected or theoretical frequency
fe

Classification of Data

1. One-way classification - has one variable described by at least two categories


Example: Civil Status frequency(f)
Single 5
Married 8 df = no. of categories -1
Widowed 6 df = 4-1 = 3
Separated 3
2. Two-way classification - has two variables each described by at least two
categories
Example:
Attitude Towards Charter Change
Gender Favor Against Undecided
fo fe fo fe fo fe Total
Male 15 10.71 8 2 25
Female 20 10 3 33
Total 35 18 5 58

Where: fo = observed/actual frequency


fe = expected/theoretical frequency
R xC
fe = R = Row total ; C = Column total; N = Grand Total
N
Example: fe = (25x35) = 10.34

df = (r – 1) (c-1) r = no. of rows; c = no. of columns


Example: df = (2 – 1) (3-1) = 2

Uses of Chi Square:

1. to test the “goodness of fit” to a normal curve, i.e., to find out whether or not a
sample distribution conforms to hypothetical/ideal distribution
Example: Tossing of a coin 10 times
fo fe
Head 4 5
Tail 6 5

2. To find out whether or not an observed proportion is equal to some given


ideal proportion
Ex. A doctor claims that a particular drug can reduce weight. Out of 30 persons
who took the drug 18 reduced in weight. If the ideal proportion is 75%, can
we conclude that the drug is effective in reducing weight? Test at 5% alpha.

fo fe
Reduced in weight 18 22
Did not reduce in weight 12 8

3. To test the independence of one variable from another variable.


Do employment of new graduates independent of the school graduated from?

Status
School Hired Not hired Total

SUC 175 125 300

PUC 140 60 200


______________________________________
Total 315 185 500

Ho: Hiring of new graduates is independent of school graduated from


Ha: Hiring of new graduates is dependent on the school graduated from

Problem Examples:

1.In a public opinion poll conducted on attitude towards women in the military
were sampled. Some 113 subjects were interviewed. The question asked was
“Do you favor women in the military?” Test at 2% alpha
Attitude

Gender YES NO DON’T KNOW TOTAL

Male 30 20 5 55
Female 38 18 2 58
__________________________________________________
Total 68 38 7 113

3. In 100 tosses of a coin, 63 heads and 37 tails are observed. Is this a balanced coin?
Test at 1% alpha.

3.In an experiment to study the dependence of hypertension on smoking habits, the


following data were taken on 180 individuals.

Non smokers Moderate smokers Heavy smokers Total

With hypertension 21 36 30
No hypertension 48 26 19
_________________________________________________________

Test the hypothesis that the absence or presence of hypertension is


independent of smoking habits. Test at 5% alpha.

4. A marketing research department has divided a certain sales region into six
districts. It is believed that all districts have the same sales potentials. The number
of units sold in specified districts are given. Test whether or not the six districts
have equal sales potential. Test at 2% alpha

District Units sold

1 12
2 18
3 15
4 25
5 22
6 28

Note: Chi Square for a 2x2 table with df = (2-1)(2-1) = 1 without computing for fe
A c k
b d l
m n N
ad−bc
¿
¿
χ 2= ¿2
N¿
¿

Example: Do hiring of graduates independent of school graduated from

SCHOOL HIRED NOT HIRED TOTAL


SUC 175 a 125 c 300 k
PUC 140 b 60 d 200 l
TOTAL 315 185 500
m n N
Ho: Hiring of graduates is independent of school graduated from
Ha: Hiring of graduates is dependent upon the school graduated from

N ( ad−bc ) 2 500 ( 175 x 60−140 x 125 ) 2


Solution: χ 2= = = 7.007 > 3.841 Ho:
klmn 300 x 200 x 315 x 185
Rejected
Ha: Accepted

df = (2-1)(2-1) = 1 α = 5% Table, App. D χ 2 ( .05 )=3.841

Conc. Hiring of graduates is dependent upon the school graduated from

Note: For other types of table, say 2x3, 3x3, 3x4, 4x5, 5x5
df = (row -1)(column -1)
fe =( RxC )/N

Problem Exercise

A study was conducted to determine the relationship between sales and location
of fast food.
Sales

Location Low Average High Total


Near School 30 25 40
Near Church 25 20 42
Near Movie
House 28 35 43

ANALYSIS OF VARIANCE
(ANOVA)

Analysis of Variance is a parametric test which is widely used and highly developed statistical
methods for comparing three or more means
Assumptions Underlying the Use of ANOVA

1. The individuals in the group and subgroups are selected randomly from a normally
distributed population
2. The samples that constitute the groups are independent

ONE- WAY CLASSIFICATION ANOVA

Sources of Variation:

1. Variance between groups – Sum of Squares Between Groups (SSB)


2. Variance within groups - Sum of Squares Within Groups (SSw)

Illustration: Distance traveled by cars with one liter of gasoline

Distance traveled, kilometers

CAR XA XB Xc X2 A X2B X2 C

1 12 18 6 144 324 36
2 18 17 4 324 289 16
3 16 16 14 256 256 196
4 8 18 4 64 324 16
5 6 12 6 36 144 36
6 12 17 12 144 289 144
7 10 10 14 100 100 196
_________________________________________________________________
Total 82 108 60 1068 1726 640
X́ 11.71 15.43 8.57

∑X2 = 1068 + 1726 + 640 = 3434


∑X = 82 + 108 + 60 = 250

Ho: No significant difference between the distance traveled by cars with one liter of
gasoline
Ha: There is a significant difference between the distance traveled by cars with one
liter of gasoline

Steps: 1. TSS = Σ X2 - (∑X)2/ N C.F. = (∑X)2/N Correction Factor = (250)2/21 = 2976.19


TSS = 3434 - (250)2/21 = 457. 81

2. SSB = Variance between groups

SSB = ∑[∑X2]/ No. of rows - C F

SSB = { (82)2 + (108)2+ (60)2}/7 - 2976.19

= 3141.2 -2976.19 = 165.01

3. SSw = Variance within the group


SSw = 457.81 – 165.01 = 292.8

ANOVA TABLE
Source of Variation Sum of Squares df Mean Square
Between groups SSB = 165.01 dfB = c-1=3-1=2 MSSB = SSB/dfB
Within groups SSw = 292.8 dfw = dft-dfB = 165.01/2 = 82.50
= 20-2=18 MSS w= SSw/dfw = 292.8/18
Total variance TSS = 457.81 dfT= N-1 =21-1=20 =16.25

F-test = MSSB/MSSW
= 82.50/16.25
= 5.06
F-test is interpreted with the use of F table
 At 5% alpha, F- table value is 3.55
 At 1% alpha, F- table value is 6.01
Thus, at 5% alpha, F-test = 5.06 > 3.55, Ho: Rejected, Ha: Accepted
and at 1% alpha, F-test = 5.06<6.01, Ho: Acceptd, Ha: Rejected

Conclusion at 5% alpha:

There is a significant difference between the distance traveled by cars


with one liter of gasoline

Conclusion at 1% alpha:

There is no significant difference between the distance traveled by cars with


one liter of gasoline

Problem Exercises:

(1) Ratings of Trainees by Four Supervisors (Scale of 10)

Trainees Raters
A B C D
1 10 6 8 7
2 4 5 3 4
3 8 4 7 4
4 3 4 2 2
5 6 8 6 7
6 9 7 8 7
(2) In an experiment designed to compare the effects of coaching on the scores obtained on
an aptitude test used for entrance to a professional school, three levels of coaching were
used: none, 4 hours, and 12 hours. A random sample of 18 applicants was chosen from
the population of applicants. Their scores are given below: Test at 5% alpha.

Scores Obtained

Group None 4 Hours 12 Hours

1 30 32 35
2 27 30 33
3 26 29 32
4 24 27 30
5 22 25 27
6 20 24 26
State: Ho: Coaching has no significant effect on the aptitude test score of applicants to
professional schools
Ha: Coaching has significant effect on the aptitude test score of applicants to
professional schools

Você também pode gostar