Você está na página 1de 74

Quantitative methods for

management
Descriptive statistics- Numerical measures

DAY 4
Recap
Day 1 Introduction, types of statistics, data and its types
Definition of statistics, terminologies : population , sample,
parameter, statistic, qualitative and quantitative data, levels of
measurements : Nominal, Ordinal, Interval and Ratio- sources
of collecting data Primary and secondary, applications of
Statistics in various functions of management - data mining and
data warehousing

Day 2 Classification of data Qualitative , quantitative, geographical


and chronological :Presentation of data frequency distribution,
relative and cumulative frequencies ; bivariate distributions,
Diagrammatic bar diagram , pie diagram
Graphical histogram, Frequency polygon, Ogive
Exploratory data analysis : Scatter diagram, stem and leaf plot

Day 3 Numerical measures Central tendency ( mean, percentiles and


mode); dispersion ( range , interquartile range and MAD)
Population standard deviation , variance ,Sample standard
deviation, variance and coefficient of variation
Day 4
Distribution shape skewness
Relative location Z score
Detecting outliers
Exploratory data analysis
Five number summary
Box plot graphical representation of Five Number
summary
Measures of association between variables
Covariance
Correlation
Grouped data
Weighted mean
Distribution Shape: Skewness
An important measure of the shape of a distribution
is called skewness.

The formula for the skewness of sample data is

xi x
3
n
Skewness
(n 1)(n 2) s

Skewness can be easily computed using statistical software.

a histogram provides a graphical display showing the shape of a


distribution.
Skewness

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed
Skewness

Mean Mode Mean Mean


Mode
Median
Median Mode Median

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed
Coefficient of Skewness
Summary measure for skewness

3 Md
S

If S < 0, the distribution is negatively skewed
(skewed to the left).
If S = 0, the distribution is symmetric (not
skewed).
If S > 0, the distribution is positively skewed
(skewed to the right).
Coefficient of Skewness
1
23 2
26 3
29

M
d1 26 M
d2 26 M
d3 26
1
12.3 2
12.3 3
12.3



3 1 M
d1


3 2 M d2

3 3 M
d3
S 1
S 2
S 3

1 2 3

3 23 26 3 26 26 3 29 26

12.3 12.3 12.3
0.73 0 0.73
Distribution Shape: Skewness

Symmetric (not skewed)


Skewness is zero.
Mean and median are equal.
Skewness = 0
.35
.30
Relative Frequency

.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Moderately Skewed Left
Skewness is negative.
Mean will usually be less than the median.
Skewness = .31
.35
.30
Relative Frequency

.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Moderately Skewed Right
Skewness is positive.
Mean will usually be more than the median.
Skewness = .31
.35
.30
Relative Frequency

.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Highly Skewed Right
Skewness is positive (often above 1.0).
Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency

.25
.20
.15
.10
.05
0
Distribution Shape: Skewness ( FOR
PRACTICE)
Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a college town. The monthly rent prices
for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Distribution Shape: Skewness

Example: Apartment Rents

.35 Skewness = .92


.30
Relative Frequency

.25

.20
.15

.10
.05
0
Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal in shape
Platykurtic: flat and spread out

Leptokurtic

Mesokurtic
Platykurtic
salary
salary

3310
Mean 3540
3355
Standard Error 47.81989569
3450 Median 3505
3480 Mode 3480
3480 Standard Deviation 165.6529779
3490 Sample Variance 27440.90909
3520 Kurtosis 1.718883645
3540 Skewness 1.091108688
3550 Range 615
Minimum 3310
3650
Maximum 3925
3730
Sum 42480
3925 Count 12
Relative location Z score
* In addition to measures of location, variability, and
shape, we are also interested in the relative location of
values within a data set.

* Measures of relative location help us determine how


far a particular value is from the mean.

* By using both the mean and standard deviation, we


can determine the relative location of any observation.
z-Scores

The z-score is often called the standardized value.

It denotes the number of standard deviations a data


value xi is from the mean.

xi x
zi
s

Excels STANDARDIZE function can be used to


compute the z-score.
z-Scores

An observations z-score is a measure of the relative


location of the observation in a data set.
A data value less than the sample mean will have a
z-score less than zero.
A data value greater than the sample mean will have
a z-score greater than zero.
A data value equal to the sample mean will have a
z-score of zero.
z-Scores
Example: Apartment Rents
z-Score of Smallest Value (425)
xi x 425 490.80
z 1.20
s 54.74

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Chebyshevs Theorem
At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is
any value greater than 1.

Chebyshevs theorem requires z > 1, but z need not


be an integer.
Chebyshevs Theorem

At least 75% of the data values must be


within z = 2 standard deviations of the mean.

At least 89% of the data values must be


within z = 3 standard deviations of the mean.

At least 94% of the data values must be


within z = 4 standard deviations of the mean.
Chebyshevs Theorem
Example: Apartment Rents
Let z = 1.5 with x = 490.80 and s = 54.74

At least (1 1/(1.5)2) = 1 0.44 = 0.56 or 56%


of the rent values must be between
x - z(s) = 490.80 1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573

(Actually, 86% of the rent values


are between 409 and 573.)
Empirical Rule

When the data are believed to approximate a


bell-shaped distribution

The empirical rule can be used to determine the


percentage of data values that must be within a
specified number of standard deviations of the
mean.

The empirical rule is based on the normal


distribution, which is covered in later chapter.
Empirical Rule
For data having a bell-shaped
distribution:
68.26% of the values of a normal random variable
are within +/- 1 standard deviation of its mean.

95.44% of the values of a normal random variable


are within +/- 2 standard deviations of its mean.

99.72% of the values of a normal random variable


are within +/- 3 standard deviations of its mean.
Empirical Rule
99.72%
95.44%
68.26%


x
3 1 + 1 + 3
2 + 2
Detecting Outliers
An outlier is an unusually small or unusually large
value in a data set.
A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
It might be:
an incorrectly recorded data value
a data value that was incorrectly included in the
data set
a correctly recorded data value that belongs in
the data set
Detecting Outliers FOR PRACTICE
Example: Apartment Rents
The most extreme z-scores are -1.20 and 2.27
Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Exploratory Data Analysis
Exploratory data analysis procedures enable us to use
simple arithmetic and easy-to-draw pictures to
summarize data.

We simply sort the data values into ascending order


and identify the five-number summary and then
construct a box plot.
FIVE NUMBER SUMMARY
1. MINIMUM
2. QUARTILE 1
3. MEDIAN
4. QUARTILE 3
5. MAXIMUM
The monthly starting salaries for a sample of 12 business
school graduates are given below ( in ascending order)
3310 3355 3450 3480 3480
3490 3520 3540 3550 3650
3730 3925
FIVE NUMBER SUMMARY ARE
Min = 3310
Q1 = 3465
Median = 3505
Q3 = 3600
Maximum = 3925
The data shows a smallest value of 3310 and a
largest value of 3925.

Approximately one-fourth, or 25%, of the


observations are between adjacent numbers
in a five-number summary.
Box Plot

A box plot is a graphical summary of data that is


based on a five-number summary.

A key to the development of a box plot is the


computation of the median and the quartiles Q1 and
Q3 .

Box plots provide another way to identify outliers.


Box and Whisker Plot
Five secific values are used:
Median, Q2
First quartile, Q1
Third quartile, Q3
Minimum value in the data set
Maximum value in the data set
Inner Fences
IQR = Q3 - Q1
Lower inner fence = Q1 - 1.5 IQR
Upper inner fence = Q3 + 1.5 IQR
Outer Fences
Lower outer fence = Q1 - 3.0 IQR
Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot

Minimum Q1 Q2 Q3 Maximum
Steps to construct Box Plot
1. A box is drawn with the ends of the box located at the first and third quartiles. For
the salary data,Q1 3465 andQ3 3600. This box contains the middle50%of the data.

2. A vertical line is drawn in the box at the location of the median (3505 for the salary
data).

3. By using the interquartile range, IQR Q3 Q1, limits are located. The limits for the
box plot are 1.5(IQR) below Q1 and 1.5(IQR) above Q3. For the salary data, IQR
Q3- Q1 = 3600 3465= 135. Thus, the limits are 3465- 1.5(135) = 3262.5 and
3600 + 1.5(135)= 3802.5. Data outside these limits are considered outliers.

4. The dashed lines are called whiskers. The whiskers are drawn from the
ends of the box to the smallest and largest values inside the limits computed in step 3.
Thus, the whiskers end at salary values of 3310 and 3730.

5. Finally, the location of each outlier is shown with the symbol *.


Five-Number Summary
Example: Apartment Rents
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Box Plot

Example: Apartment Rents


A box is drawn with its ends located at the first and
third quartiles.
A vertical line is drawn in the box at the location of
the median (second quartile).

400 425 450 475 500 525 550 575 600 625

Q1 = 445 Q3 = 525
Q2 = 475
Box Plot

Limits are located (not drawn) using the interquartile


range (IQR).
Data outside these limits are considered outliers.
The locations of each outlier is shown with the
symbol * .
continued
Box Plot

Example: Apartment Rents


The lower limit is located 1.5(IQR) below Q1.

Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

The upper limit is located 1.5(IQR) above Q3.

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

There are no outliers (values less than 325 or


greater than 645) in the apartment rent data.
Box Plot
Example: Apartment Rents
Whiskers (dashed lines) are drawn from the ends
of the box to the smallest and largest data values
inside the limits.

400 425 450 475 500 525 550 575 600 625

Smallest value Largest value


inside limits = 425 inside limits = 615
Measures of Association
Between Two Variables
Thus far we have examined numerical methods used
to summarize the data for one variable at a time.

Often a manager or decision maker is interested in


the relationship between two variables.

Two descriptive measures of the relationship


between two variables are covariance and correlation
coefficient.
Covariance
The covariance is a measure of the linear association
between two variables.

Positive values indicate a positive relationship.

Negative values indicate a negative relationship.


Covariance

The covariance is computed as follows:

( xi x )( yi y ) for
sxy
n 1 samples

( xi x )( yi y ) for
xy populations
N
Correlation Coefficient

Correlation is a measure of linear association and not


necessarily causation.

Just because two variables are highly correlated, it


does not mean that one variable is the cause of the
other.
Correlation Coefficient

The correlation coefficient is computed as follows:


sxy xy
rxy xy
sx s y x y

for for
samples populations
Correlation Coefficient
The coefficient can take on values between -1 and +1.

Values near -1 indicate a strong negative linear


relationship.

Values near +1 indicate a strong positive linear


relationship.

The closer the correlation is to zero, the weaker the


relationship.
Types of Correlation
Positive Vs Negative or Direct Vs Indirect
Sales of aerosol sprays & the greenhouse effect
Advertising & sales
Pollution emissions & Anti pollution expenditure
Linear Vs Curvilinear
L - Change in one with respect to a corresponding
change in the other constant ratio
C rate of change is not constant learning curve in
some industries if a product is made, time required
to make one unit is decreased by a fixed proportion
as total number of units double
Types of Correlation
Simple Vs Partial Vs Multiple Correlation
Simple 2 variables crop output & fertilizer
Partial 2 variables but the effect of the
influence of the other is kept constant sales
influenced by advt, product quality, price,
competition
Multiple Job satisfaction & salary,
advancement, job.
Three Degrees of Correlation

r<0 r>0

r=0
Coefficient of Correlation
+1 Strong positive linear relationship

or r = 0 No linear relationship

-1 Strong negative linear relationship


Pearson Product-Moment Correlation
Coefficient

SSXY
r
SSX SSY


X X Y Y
X X Y Y
2 2

X Y
XY n



X
2

Y 2
Y
2


1 r 1
2

X n n

Covariance and Correlation Coefficient

Example: Golfing Study


A golfer is interested in investigating the
relationship, if any, between driving distance and
18-hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69
Covariance and Correlation Coefficient

Example: Golfing Study

x y ( xi x ) ( yi y ) ( xi x )( yi y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944
Covariance and Correlation Coefficient

Example: Golfing Study


Sample Covariance
sxy
( x x )( y
i i y)

35.40
7.08
n1 61
Sample Correlation Coefficient
sxy 7.08
rxy -.9631
sx sy (8.2192)(.8944)
Computation of r
Futures
Interest Index
Day X Y X2 Y2 XY
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
Summations 92.93 2,725 720.220 619,207 21,115.07
Computation of r

X Y
XY
n
r


X
2

Y
2

X n Y n
2 2



92.93 2725
21,115.07
12

720.22

92 .93 2

619,207 2725
2

12 12

.815
Scatter Plot and Correlation Matrix
for the Economics Example

245
240
Futures Index

235
230
225
220
7.40 7.60 7.80 8.00 8.20
Interest

Interest Futures Index


Interest 1
Futures Index 0.815254 1
Problem
A professor is trying to show his students
the importance of tests even though 90%
of final marks is determined by exams.
Random sample of 15 students
T 59 92 72 90 95 87 89 77 76
65 97 42 94 62 91
F 65 84 77 80 77 81 80 84 80
69 83 40 78 65 90
Draw a scatter diagram
Problem cond..
Scatter Diagram - Test V Final Scores

100

80
Final Scores

60

40

20

0
0 50 100 150
Test Scores
Problem
An instructor is interested in finding out
how the number of absentees on a given
day is related to the mean temp that day
sample of 10 days
Abs
8 7 5 4 2 3 5 6 8 9
Temp
10 20 25 30 40 45 50 55 59 60
What is DV and IV? Draw a scatter diagram.
Explain the shape of the diagram.
Problem

Temperature Vs Absenteeism

10

8
Absenteeism

0
0 20 40 60 80
Temperature
Weighted Mean
When the mean is computed by giving each data
value a weight that reflects its importance, it is
referred to as a weighted mean.
In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.
Weighted Mean

x wx i i

w i

where:
xi = value of observation i
wi = weight for observation i
Grouped Data
The weighted mean computation can be used to
obtain approximations of the mean, variance, and
standard deviation for the grouped data.
To compute the weighted mean, we treat the
midpoint of each class as though it were the mean
of all items in the class.
We compute a weighted mean of the class midpoints
using the class frequencies as weights.
Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.
Mean for Grouped Data
Sample Data

x fM i i

Population Data

fMi i

N
where:
fi = frequency of class i
Mi = midpoint of class i
Mean of Grouped Data
Weighted average of class midpoints
Class frequencies are the weights


fM
f

fM
N
f 1M 1 f 2 M 2 f 3 M 3 f iM i

f 1 f 2 f 3 fi
Calculation of Grouped Mean
Class Interval Frequency Class Midpoint fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150

fM

2150
43 . 0
f 50
Median of Grouped Data

N
cfp
Median L 2 W
fmed
Where:
L the lower limit of the median class
cfp = cumulative frequency of class preceding the median class
fmed = frequency of the median class
W = width of the median class
N = total of frequencies
Median of Grouped Data -- Example
Cumulative N
Class Interval Frequency Frequency cfp
20-under 30 6 6 Md L 2 W
fmed
30-under 40 18 24
40-under 50 11 35 50
50-under 60 11 46 24
60-under 70 3 49 40 2 10
11
70-under 80 1 50
N = 50 40.909
Variance and Standard Deviation
of Grouped Data

Population Sample

f M S M X
2 2
f

2


2
n1
N
S
2


2 S
Population Variance and Standard
Deviation of Grouped Data

f M fM M M M
2 2
Class Interval f

20-under 30 6 25 150 -18 324 1944


30-under 40 18 35 630 -8 64 1152
40-under 50 11 45 495 2 4 44
50-under 60 11 55 605 12 144 1584
60-under 70 3 65 195 22 484 1452
70-under 80 1 75 75 32 1024 1024
50 2150 7200

M 2
2
f 7200
144 12

2
144
N 50
Parameters and Statistics
Population Sample
Size N n
Mean
Variance S2
Standard
Deviation S
Coefficient of
Variation CV cv
Covariance Sxy
Coefficient of
Correlation r
Chapter 3 : page 123-168

Você também pode gostar