QMM 4

Quantitative methods for
management
Descriptive statistics- Numerical measures
DAY 4
Recap
Day 1 Introduction, types of statistics, data and its types
Definition of statistics, terminologies : population , sample,
parameter, statistic, qualitative and quantitative data, levels of
measurements : Nominal, Ordinal, Interval and Ratio- sources
of collecting data Primary and secondary, applications of
Statistics in various functions of management - data mining and
data warehousing
Day 2 Classification of data Qualitative , quantitative, geographical

and chronological :Presentation of data frequency distribution,
relative and cumulative frequencies ; bivariate distributions,
Diagrammatic bar diagram , pie diagram
Graphical histogram, Frequency polygon, Ogive
Exploratory data analysis : Scatter diagram, stem and leaf plot
Day 3 Numerical measures Central tendency ( mean, percentiles and

mode); dispersion ( range , interquartile range and MAD)
Population standard deviation , variance ,Sample standard
deviation, variance and coefficient of variation
Day 4
Distribution shape skewness
Relative location Z score
Detecting outliers
Exploratory data analysis
Five number summary
Box plot graphical representation of Five Number
summary
Measures of association between variables
Covariance
Correlation
Grouped data
Weighted mean
Distribution Shape: Skewness
An important measure of the shape of a distribution
is called skewness.
The formula for the skewness of sample data is
xi x
3
n
Skewness
(n 1)(n 2) s

Skewness can be easily computed using statistical software.
a histogram provides a graphical display showing the shape of a

distribution.
Skewness
Negatively Symmetric Positively

Skewed (Not Skewed) Skewed
Skewness
Mean Mode Mean Mean

Mode
Median
Median Mode Median
Negatively Symmetric Positively

Skewed (Not Skewed) Skewed
Coefficient of Skewness
Summary measure for skewness
3 Md
S

If S < 0, the distribution is negatively skewed
(skewed to the left).
If S = 0, the distribution is symmetric (not
skewed).
If S > 0, the distribution is positively skewed
(skewed to the right).
Coefficient of Skewness
1
23 2
26 3
29
M
d1 26 M
d2 26 M
d3 26
1
12.3 2
12.3 3
12.3

3 1 M
d1

3 2 M d2

3 3 M
d3
S 1
S 2
S 3

1 2 3
3 23 26 3 26 26 3 29 26

12.3 12.3 12.3
0.73 0 0.73
Symmetric (not skewed)

Skewness is zero.
Mean and median are equal.
Skewness = 0
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Moderately Skewed Left
Skewness is negative.
Mean will usually be less than the median.
Skewness = .31
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Moderately Skewed Right
Skewness is positive.
Mean will usually be more than the median.
Skewness = .31
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Highly Skewed Right
Skewness is positive (often above 1.0).
Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness ( FOR
PRACTICE)
Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a college town. The monthly rent prices
for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
.35 Skewness = .92

.30
Relative Frequency
.25
.20
.15
.10
.05
0
Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal in shape
Platykurtic: flat and spread out
Leptokurtic
Mesokurtic
Platykurtic
salary
salary
3310
Mean 3540
3355
Standard Error 47.81989569
3450 Median 3505
3480 Mode 3480
3480 Standard Deviation 165.6529779
3490 Sample Variance 27440.90909
3520 Kurtosis 1.718883645
3540 Skewness 1.091108688
3550 Range 615
Minimum 3310
3650
Maximum 3925
3730
Sum 42480
3925 Count 12
Relative location Z score
* In addition to measures of location, variability, and
shape, we are also interested in the relative location of
values within a data set.
* Measures of relative location help us determine how

far a particular value is from the mean.
* By using both the mean and standard deviation, we

can determine the relative location of any observation.
z-Scores
The z-score is often called the standardized value.
It denotes the number of standard deviations a data

value xi is from the mean.
xi x
zi
s
Excels STANDARDIZE function can be used to

compute the z-score.
z-Scores
An observations z-score is a measure of the relative

location of the observation in a data set.
A data value less than the sample mean will have a
z-score less than zero.
A data value greater than the sample mean will have
a z-score greater than zero.
A data value equal to the sample mean will have a
z-score of zero.
z-Scores
z-Score of Smallest Value (425)
xi x 425 490.80
z 1.20
s 54.74
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Chebyshevs Theorem
At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is
any value greater than 1.
Chebyshevs theorem requires z > 1, but z need not

be an integer.
Chebyshevs Theorem
At least 75% of the data values must be

within z = 2 standard deviations of the mean.


Chebyshevs Theorem
Let z = 1.5 with x = 490.80 and s = 54.74
At least (1 1/(1.5)2) = 1 0.44 = 0.56 or 56%

of the rent values must be between
x - z(s) = 490.80 1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573
(Actually, 86% of the rent values

are between 409 and 573.)
Empirical Rule
When the data are believed to approximate a

bell-shaped distribution
The empirical rule can be used to determine the

percentage of data values that must be within a
specified number of standard deviations of the
mean.
The empirical rule is based on the normal

distribution, which is covered in later chapter.
Empirical Rule
For data having a bell-shaped
distribution:
68.26% of the values of a normal random variable
are within +/- 1 standard deviation of its mean.

are within +/- 2 standard deviations of its mean.

are within +/- 3 standard deviations of its mean.
Empirical Rule
99.72%
95.44%
68.26%

x
3 1 + 1 + 3
2 + 2
Detecting Outliers
An outlier is an unusually small or unusually large
value in a data set.
A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
It might be:
an incorrectly recorded data value
a data value that was incorrectly included in the
data set
a correctly recorded data value that belongs in
the data set
Detecting Outliers FOR PRACTICE
The most extreme z-scores are -1.20 and 2.27
Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Exploratory Data Analysis
Exploratory data analysis procedures enable us to use
simple arithmetic and easy-to-draw pictures to
summarize data.
We simply sort the data values into ascending order

and identify the five-number summary and then
construct a box plot.
FIVE NUMBER SUMMARY
1. MINIMUM
2. QUARTILE 1
3. MEDIAN
4. QUARTILE 3
5. MAXIMUM
The monthly starting salaries for a sample of 12 business
school graduates are given below ( in ascending order)
3310 3355 3450 3480 3480
3490 3520 3540 3550 3650
3730 3925
FIVE NUMBER SUMMARY ARE
Min = 3310
Q1 = 3465
Median = 3505
Q3 = 3600
Maximum = 3925
The data shows a smallest value of 3310 and a
largest value of 3925.
Approximately one-fourth, or 25%, of the

observations are between adjacent numbers
in a five-number summary.
Box Plot
A box plot is a graphical summary of data that is

based on a five-number summary.
A key to the development of a box plot is the

computation of the median and the quartiles Q1 and
Q3 .
Box plots provide another way to identify outliers.

Box and Whisker Plot
Five secific values are used:
Median, Q2
First quartile, Q1
Third quartile, Q3
Minimum value in the data set
Maximum value in the data set
Inner Fences
IQR = Q3 - Q1
Lower inner fence = Q1 - 1.5 IQR
Upper inner fence = Q3 + 1.5 IQR
Outer Fences
Lower outer fence = Q1 - 3.0 IQR
Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot
Minimum Q1 Q2 Q3 Maximum
Steps to construct Box Plot
1. A box is drawn with the ends of the box located at the first and third quartiles. For
the salary data,Q1 3465 andQ3 3600. This box contains the middle50%of the data.
2. A vertical line is drawn in the box at the location of the median (3505 for the salary
data).
3. By using the interquartile range, IQR Q3 Q1, limits are located. The limits for the
box plot are 1.5(IQR) below Q1 and 1.5(IQR) above Q3. For the salary data, IQR
Q3- Q1 = 3600 3465= 135. Thus, the limits are 3465- 1.5(135) = 3262.5 and
3600 + 1.5(135)= 3802.5. Data outside these limits are considered outliers.
4. The dashed lines are called whiskers. The whiskers are drawn from the
ends of the box to the smallest and largest values inside the limits computed in step 3.
Thus, the whiskers end at salary values of 3310 and 3730.
5. Finally, the location of each outlier is shown with the symbol *.

Five-Number Summary
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Box Plot

A box is drawn with its ends located at the first and
third quartiles.
A vertical line is drawn in the box at the location of
the median (second quartile).
400 425 450 475 500 525 550 575 600 625
Q1 = 445 Q3 = 525
Q2 = 475
Box Plot
Limits are located (not drawn) using the interquartile

range (IQR).
Data outside these limits are considered outliers.
The locations of each outlier is shown with the
symbol * .
continued
Box Plot

The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
The upper limit is located 1.5(IQR) above Q3.
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645
There are no outliers (values less than 325 or

greater than 645) in the apartment rent data.
Box Plot
Whiskers (dashed lines) are drawn from the ends
of the box to the smallest and largest data values
inside the limits.
400 425 450 475 500 525 550 575 600 625
Smallest value Largest value

inside limits = 425 inside limits = 615
Measures of Association
Between Two Variables
Thus far we have examined numerical methods used
to summarize the data for one variable at a time.
Often a manager or decision maker is interested in

the relationship between two variables.
Two descriptive measures of the relationship

between two variables are covariance and correlation
coefficient.
Covariance
The covariance is a measure of the linear association
between two variables.
Positive values indicate a positive relationship.
Negative values indicate a negative relationship.

Covariance
The covariance is computed as follows:
( xi x )( yi y ) for
sxy
n 1 samples
( xi x )( yi y ) for
xy populations
N
Correlation Coefficient
Correlation is a measure of linear association and not

necessarily causation.
Just because two variables are highly correlated, it

does not mean that one variable is the cause of the
other.
The correlation coefficient is computed as follows:

sxy xy
rxy xy
sx s y x y
for for
samples populations
The coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear

relationship.
Values near +1 indicate a strong positive linear

relationship.
The closer the correlation is to zero, the weaker the

relationship.
Types of Correlation
Positive Vs Negative or Direct Vs Indirect
Sales of aerosol sprays & the greenhouse effect
Advertising & sales
Pollution emissions & Anti pollution expenditure
Linear Vs Curvilinear
L - Change in one with respect to a corresponding
change in the other constant ratio
C rate of change is not constant learning curve in
some industries if a product is made, time required
to make one unit is decreased by a fixed proportion
as total number of units double
Types of Correlation
Simple Vs Partial Vs Multiple Correlation
Simple 2 variables crop output & fertilizer
Partial 2 variables but the effect of the
influence of the other is kept constant sales
influenced by advt, product quality, price,
competition
Multiple Job satisfaction & salary,
advancement, job.
Three Degrees of Correlation
r<0 r>0
r=0
Coefficient of Correlation
+1 Strong positive linear relationship
or r = 0 No linear relationship
-1 Strong negative linear relationship

Pearson Product-Moment Correlation
Coefficient
SSXY
r
SSX SSY

X X Y Y
X X Y Y
2 2
X Y
XY n

X
2

Y 2
Y
2

1 r 1
2
X n n

Covariance and Correlation Coefficient
Example: Golfing Study

A golfer is interested in investigating the
relationship, if any, between driving distance and
18-hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69
x y ( xi x ) ( yi y ) ( xi x )( yi y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944

Sample Covariance
sxy
( x x )( y
i i y)

35.40
7.08
n1 61
Sample Correlation Coefficient
sxy 7.08
rxy -.9631
sx sy (8.2192)(.8944)
Computation of r
Futures
Interest Index
Day X Y X2 Y2 XY
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
Summations 92.93 2,725 720.220 619,207 21,115.07
Computation of r
X Y
XY
n
r

X
2

Y
2
X n Y n
2 2

92.93 2725
21,115.07
12

720.22

92 .93 2

619,207 2725
2
12 12

.815
Scatter Plot and Correlation Matrix
for the Economics Example
245
240
Futures Index
235
230
225
220
7.40 7.60 7.80 8.00 8.20
Interest
Interest Futures Index

Interest 1
Futures Index 0.815254 1
Problem
A professor is trying to show his students
the importance of tests even though 90%
of final marks is determined by exams.
Random sample of 15 students
T 59 92 72 90 95 87 89 77 76
65 97 42 94 62 91
F 65 84 77 80 77 81 80 84 80
69 83 40 78 65 90
Draw a scatter diagram
Problem cond..
Scatter Diagram - Test V Final Scores
100
80
Final Scores
60
40
20
0
0 50 100 150
Test Scores
Problem
An instructor is interested in finding out
how the number of absentees on a given
day is related to the mean temp that day
sample of 10 days
Abs
8 7 5 4 2 3 5 6 8 9
Temp
10 20 25 30 40 45 50 55 59 60
What is DV and IV? Draw a scatter diagram.
Explain the shape of the diagram.
Problem
Temperature Vs Absenteeism
10
8
Absenteeism
0
0 20 40 60 80
Temperature
Weighted Mean
When the mean is computed by giving each data
value a weight that reflects its importance, it is
referred to as a weighted mean.
In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.
Weighted Mean
x wx i i
w i
where:
xi = value of observation i
wi = weight for observation i
Grouped Data
The weighted mean computation can be used to
obtain approximations of the mean, variance, and
standard deviation for the grouped data.
To compute the weighted mean, we treat the
midpoint of each class as though it were the mean
of all items in the class.
We compute a weighted mean of the class midpoints
using the class frequencies as weights.
Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.
Mean for Grouped Data
Sample Data
x fM i i
Population Data
fMi i
N
where:
fi = frequency of class i
Mi = midpoint of class i
Mean of Grouped Data
Weighted average of class midpoints
Class frequencies are the weights

fM
f

fM
N
f 1M 1 f 2 M 2 f 3 M 3 f iM i

f 1 f 2 f 3 fi
Calculation of Grouped Mean
Class Interval Frequency Class Midpoint fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150
fM

2150
43 . 0
f 50
Median of Grouped Data
N
cfp
Median L 2 W
fmed
Where:
L the lower limit of the median class
cfp = cumulative frequency of class preceding the median class
fmed = frequency of the median class
W = width of the median class
N = total of frequencies
Median of Grouped Data -- Example
Cumulative N
Class Interval Frequency Frequency cfp
20-under 30 6 6 Md L 2 W
fmed
30-under 40 18 24
40-under 50 11 35 50
50-under 60 11 46 24
60-under 70 3 49 40 2 10
11
70-under 80 1 50
N = 50 40.909
Variance and Standard Deviation
of Grouped Data
Population Sample
f M S M X
2 2
f

2

2
n1
N
S
2

2 S
Population Variance and Standard
Deviation of Grouped Data
f M fM M M M
2 2
Class Interval f
20-under 30 6 25 150 -18 324 1944

30-under 40 18 35 630 -8 64 1152
40-under 50 11 45 495 2 4 44
50-under 60 11 55 605 12 144 1584
60-under 70 3 65 195 22 484 1452
70-under 80 1 75 75 32 1024 1024
50 2150 7200
M 2
2
f 7200
144 12
2
144
N 50
Parameters and Statistics
Population Sample
Size N n
Mean
Variance S2
Standard
Deviation S
Coefficient of
Variation CV cv
Covariance Sxy
Coefficient of
Correlation r
Chapter 3 : page 123-168

QMM 4

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

QMM 4

Enviado por

Direitos autorais:

Formatos disponíveis

Quantitative methods for

Day 2 Classification of data Qualitative , quantitative, geographical

Day 3 Numerical measures Central tendency ( mean, percentiles and

The formula for the skewness of sample data is

Skewness can be easily computed using statistical software.

a histogram provides a graphical display showing the shape of a

Negatively Symmetric Positively

Mean Mode Mean Mean

Negatively Symmetric Positively

Symmetric (not skewed)

Example: Apartment Rents

.35 Skewness = .92

* Measures of relative location help us determine how

* By using both the mean and standard deviation, we

The z-score is often called the standardized value.

It denotes the number of standard deviations a data

Excels STANDARDIZE function can be used to

An observations z-score is a measure of the relative

Standardized Values for Apartment Rents

Chebyshevs theorem requires z > 1, but z need not

At least 75% of the data values must be

At least 89% of the data values must be

At least 94% of the data values must be

At least (1 1/(1.5)2) = 1 0.44 = 0.56 or 56%

(Actually, 86% of the rent values

When the data are believed to approximate a

The empirical rule can be used to determine the

The empirical rule is based on the normal

95.44% of the values of a normal random variable

99.72% of the values of a normal random variable

Standardized Values for Apartment Rents

We simply sort the data values into ascending order

Approximately one-fourth, or 25%, of the

A box plot is a graphical summary of data that is

A key to the development of a box plot is the

Box plots provide another way to identify outliers.

5. Finally, the location of each outlier is shown with the symbol *.

Example: Apartment Rents

Limits are located (not drawn) using the interquartile

Example: Apartment Rents

Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

The upper limit is located 1.5(IQR) above Q3.

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

There are no outliers (values less than 325 or

Smallest value Largest value

Often a manager or decision maker is interested in

Two descriptive measures of the relationship

Positive values indicate a positive relationship.

Negative values indicate a negative relationship.

The covariance is computed as follows:

Correlation is a measure of linear association and not

Just because two variables are highly correlated, it

The correlation coefficient is computed as follows:

Values near -1 indicate a strong negative linear

Values near +1 indicate a strong positive linear

The closer the correlation is to zero, the weaker the

-1 Strong negative linear relationship

Example: Golfing Study

Example: Golfing Study

Example: Golfing Study

Interest Futures Index

20-under 30 6 25 150 -18 324 1944

Você também pode gostar