Topic 5 Measures of Dispersion

X INTRODUCTION
We have discussed earlier the position quantities such as mean and quartiles,
which can be used to summarise the distributions. However these quantities are
ordered numbers located at the horizontal axis of the distribution graph. As
numbers along the line, they are not able to explain in quantitative measure for
example about the shape of the distribution.

In this topic we will learn about quantity measures regarding the shape of a
distribution. For example, the quantity namely variance is usually used to measure
the dispersion of observations around their mean location. The range is used to
describe the coverage of a given data set. Coefficient of skewedness will be used
to measure the assymetricl distribution of a curve. The coefficient of curtosis is
used to measure peakedness of a distribution curve.

T
T
o
o
p
p
i
i
c
c
5
5

X
Measures of
Dispersion
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Describe the concept of dispersion measures;
2. Explain the concept of range as a dispersion measure;
3. Categorise the distribution curve by its symmetry and non-symmetry; and
4. Analyse variance and standard deviation.
TOPIC 5 MEASURES OF DISPERSION W
63

MEASURE OF DISPERSION
The mean of a distribution has been termed as location parameter. Locations of
any two different distributions can be observed by looking at their respective
means. The range will tell us about the coverage of a distribution, whilst
variance will measure the distribution of observations around their mean and
hence the shape of a distribution curve. Small value of variance means the
distribution curve is more pointed and the larger value of the variance indicate the
distribution curve is more flat. Thus, variance is sometime called shape
parameter.

Figure 5.1(a) shows two distribution curves with different location centres but
possibly of same dispersion measure (they may have the same range of coverage,
but of different variances). Curve 1 could represent a distribution of mathematics
marks of male students from School A and Curve 2 represents distribution of
mathematics marks of female students in the same examination from the same
school.

Figure 5.1 (a): Two distribution curves with different location centres but possibly of
same dispersion measure

Figure 5.1 (b) shows two distribution curves with same location centre but
possibly of different dispersion measures (they may have different range of
5.1
Is it important to comprehend quantities like mean and quartiles to
prepare you to study this topic? Give your reasons.
ACTIVITY 5.1
X TOPIC 5 MEASURES OF DISPERSION

64
coverages, as well as variances). Curve 3 could represent a distribution of physics
marks of male students from School A and Curve 4 represents distribution of
physics marks of male students in the same examination but from School B.

Figure 5.1 (b): Two distribution curves with same location centre but possibly of
different dispersion measures

Figure 5.1 (c) shows two distribution curves with different location centres but
possibly of same dispersion measure (they may have the same range of coverage,
but possibly of the same variance). However, Curve 5 is slightly skewed to the
right and Curve 6 is slightly skewed to the left. Curve 5 could represent a
distribution of mathematics marks of students from School A and Curve 6 may
represent distribution of mathematics marks of students in the same trial
examination but from different school.

Figure 5.1 (c): Two distribution curves with different location centres but possibly of
same dispersion measure

By looking at the Figure 5.1(a), (b) and (c), beside the mean, we need to know
other quantities such as variance, range and coefficient of skewedness in order to
describe or summarise completely a given distribution.
65

The following are examples of dispersion measures:
(a) Dispersion Measure Around Mean of Distribution
It measures the deviation of observations from their mean. There are two
types that can be considered:
(i) Mean Deviation; and
(ii) Standard Deviation.
However in this module, we will consider only Standard Deviation. You can
refer to any statistics book for Mean Deviation.
(b) Central Percentage Dispersion Measure
This measure has some relationship with median. There are two types that
can be considered:
(i) Central Percentage Range 10 90;
(ii) Semi Inter Quartile Range.
(c) Distribution Coverage
This quantity measures the range of the whole distribution which shows the
overall coverage of observations in data set.
THE RANGE

Thus,

|
|
.
|
\
|
|
|
.
|
\
|
=
value
imum min the
value
imum max the
Range

5.2
Dispersion Measure involves measuring the degree of scatteredness
observations surrounding their mean centre.
The range is defined as the difference between the maximum value and the
minimum value of observations.
(5.1)

66
As can be seen from the formula (1.1), range can be easily calculated. However, it
is depending on the two extreme values to measure the overall data coverage. It
does not explain anything about the variation of observations between the two
extreme values.

Example 5.1

Give comment on the scatteredness of observations in each of data sets:

Set 1 12 6 7 3 15 5 10 18 5
Set 2 9 3 8 8 9 7 8 9 18

Solution

Figure 5.2
(a) Arrange observations in ascending order of values, and draw scatter points
plot for each set of data.
Set 1 3 5 5 6 7 10 12 15 18
Set 2 3 7 8 8 8 9 9 9 18

(b) Both data sets having same range which is 18 3 = 15.
67
(c) Observations in Set 1 are scattered almost evenly through out the range.
However, for Set 2, most of the observations are concentrated around
numbers 8 and 9.
(d) We can consider numbers 3 and 18 as outliers to the main body of the data
Set 2.
(e) From this exercise we learn that it is not good enough to compare only
overall data coverage using range. Some other dispersion measures have to
be considered too.

To conclude, Figures 5.2 show that two distributions can have the same range but
they could be of different shapes which cannot be explained by range.

Range does not explain the density of a data set. What do you
understand about this statement? Discuss it with your coursemates.
EXERCISE 5.1
1. The following are two sets of mathematics marks from an
examination:
Set A: 45, 48, 52, 54, 55, 55, 57, 59, 60, 65
Set B: 25, 32, 40, 45, 53, 60, 61, 71, 78, 85
(i) Calculate the mean, and the range of both data sets.
(ii) Give comment on the scattered ness of observations in both
sets.

2. Below are two sets of physics marks in an examination.
Set C 35 62 42 75 26 50 57 8 88 80 18 83
Set D 50 42 60 62 57 43 46 56 53 88 8 59
(i) Calculate the mean, and the range of both data sets.
(ii) Give comment on the scattered ness of observations in both sets.
ACTIVITY 5.2

68
INTER QUARTILE RANGE

The longer range indicates that the observations in the central main body are more
scattered. This quantity measure can be used to complement to the overall range
of data as the latter has failed to explain the variations of observations between
two extreme values.

Besides, the former does not depend on the two extreme values. Thus inter
quartile range can be used to measure the dispersions of main body data. It is also
recommended to complement the overall data range when we make comparison of
two sets of data.

For example let us consider question 2 in Exercise 5.1 where Set C is compared
with Set D. Although they have the same overall data range (88 8 = 80), they
have different distribution. The inter quartile range for Set C is larger than for Set
D. This indicates that the main body data of Set D is less scattered than the main
body data of Set C.

Inter quartile range is given by:

1 3
Q Q IQR =

Where double bars | | means absolute value. Some reference books prefer to use
Semi Inter Quartile Range which is given by:

Q =
|
|
.
|
\
|
=
2 2
1 3
Q Q
IQR

5.3
Inter quartile range is the difference between Q
3
and Q
1
. It is used to measure
the range of 50% central main body of data distribution.
(5.2)
(5.3)
69
Example 5.2

By using the inter quartile range, compare the spread of data between Set C and
Set D in question 2 in Exercise 5.1.

Solution
(a) Number of observations, n = 12 for both data sets.
(b) Q
1
is at the position ( ) 1
4
1
+ n = 3.25.
Set C: 8, 18, 26, 35, 42, 50, 57, 62, 75, 80, 83, 88
Set D: 8, 42, 43, 46, 50, 53, 56, 57, 59, 60, 62 ,88
Set C: Q
1
= 26 + 0.25 (35 26) = 28.25;
Set D: Q
1
= 43 + 0.25 (46 43) = 43.75.
(c) Q
3
is at the position ( ) 1
4
3
+ n = 9.75.
Set C, Q
3
= 75 + 0.75 (80 75) = 78.75;
Set D, Q
3
= 59 + 0.75 (60 59) = 59.75
(d) Then the inter quartile range for each data set is given by
IQR(C) = 78.75 28.25 = 50.5; and
IQR(D) = 59.75 43.75 = 16.0.

(e) Since IQR (D) < IQR(C) therefore data Set D is considered less spread than
Set C.

Coefficient of Variation, V
Q
Inter quartile range (IQR) and Semi inter quartile range (Q) are two quantities
which have dimensions. Therefore they become meaningless when being used in
comparing two data sets of different units. For instance, comparison of data on
age (years) and weights (Kg). To avoid this problem, we can use the coefficient of
quartiles variation, which has no dimension and is given by:

70
3 1
3 1
3 1 3 1
2
2
Q
Q Q
Q Q
Q
V
TTQ Q Q Q Q
| |
|
\ .
= = =
+ | + |
|
\ .

In the above formula (5.4), TTQ is the mid point between Q
1
, and Q
3
; and the two
bars | | means absolute value.

EXERCISE 5.2
Given the following three sets of data:

Set E:
Age(Yrs)
5-14 15-24 25-34 35-44 45-54 55-64 65-74
_
f

Number of
Residents
35 90 120 98 130 52 25 550

Set F:
Value of
Products (RM)
x 100
10-
14.99
15-
19.99
20-
24.99
25-
29.99
30-
34.99
35-
39.99
40-
44.99
_
f
Number of
Products
2 6 15 22 35 15 5 100

Set G:
Extra
Charges(RM)
1-1.02
1.03-
1.05
1.06-
1.08
1.09-
1.11
1.12-
1.14
1.15-
1.17
1.18-
1.20
_
f
Number of
Shops
3 15 28 30 25 14 5 120
1. Calculate Q
1
, Q
2
, Q
3
for each data set
2. Obtain the inter quartile range (IQR) for each data set
3. Then make comparison of the spread of the above data sets.
(5.4)
71
VARIANCE AND STANDARD DEVIATION

If we have two distributions, the one with larger variance is more spreading and
hence its frequency curve is more flat. Variance of population uses symbol
2
o .
Variance always has positive sign. Standard deviation is obtained by taking square
root of the variance. In this module, we will consider the given data as a population.
5.4.1 Standard Deviation and the Variance of
Ungrouped Data
Suppose we have n numbers x
1
, x
2
, , x
n
, with their mean (given or calculated) as
. Then the standard deviation is given by:

( )
n
x
n
i
i _
=
=
1
2
o

In words it means the square root of the average of squared distance of each score
(or observation) from the mean. It has positive sign. The population variance (
2
o )
is the square of the standard deviation.
Table 5.1: Steps of Obtaining Population Standard Deviation
Steps Symbols Used
(a) Calculate the population mean

(b) Obtain the deviation of each score from mean ( )
i
x
, I = 1,2,,n
(c) Obtain the square of deviation in step (b) ( )
2

i
x
, I = 1,2,,n
(d) Obtain the average of the squared deviations
( )
n
x
n
i
i _
=
1
2

(e) Obtain the square root of the average in step (d)
( )
n
x
n i
i
i _
=
=
=
1
2
o

5.4
Variance is defined as the average of squared distance of each score (or
observation) from the mean. It is used to measure the spreading of data.
(5.5)

72
Example 5.3

Obtain the standard deviation of the data set 20, 30, 40, 50, 60.

Solution

Variable (x) Mean Deviation
( x )
Squared Mean Deviation
( x )
2
20 -20 400
30 -10 100
40 0 0
50 10 100
60 20 400
Sum = 200 Sum = 1000
5 / 200 = =40 Mean squared = 1000/5 = 200

Using formula (5.5), the standard deviation of the population,
200 14.14 o = = .

5.4.2 Alternative Formulas to Enhance Hand
Calculations
(a) To avoid of subtracting each score from , the equivalence formula (5.6)
can be used to calculate the standard deviation.

EXERCISE 5.3
1. Obtain the standard deviation of data Sets 1 & 2 in Example 5.1.
2. Obtain the standard deviation of data Sets A, B, C and D in
Exercise 5.1.
73
2
2
|
|
.
|
\
|
=
_ _
n
x
n
x
i i
o

Example 5.4

Obtain the standard deviation of data Set 2 in Example 5.1

Solution

From formula (5.5), the standard deviation is

2
817 79
3.71
9 9
o
| |
= =
|
\ .

(You can compare this with the answer obtained in Exercise 5.3.)

(b) Some time the population mean is not needed and we are only required to
find standard deviation. The formula (5.7) below does not involve , which
can be used instead. In this formula, A is assumed mean which is an
arbitrary number. You can select such A either from the given numbers in
the set or any convenience number as you like.

( ) ( )
2
2
|
|
.
|
\
|
=
_ _
n
A x
n
A x
i i
o

(5.6)
(5.7)

74
Example 5.5

Obtain the standard deviation of data Set 1 in Example 5.1.

Solution

Let us select number 10 in the data set as assumed mean A, then

By using formula (5.5), the standard deviation is

2
217 9
4.807 4.81
9 9
o
| |
= = ~
|
\ .
For comparison, suppose we choose an arbitrary number A = 5, the standard
deviation is given by
2
352 36
4.807
9 9
o
| |
= =
|
\ .

We notice that the two values of assumed mean A give the same value of standard
deviation.

Standard Deviation and Variance of Grouped Data

Standard deviation can be calculated through the following formula:

2
2
|
|
.
|
\
|
=
_ _
n
x f
n
x f
i i i i
o
where
i
x is the class mid-point of the ith class whose frequency is f
i
.
(5.8)
75
Example 5.6

Obtain the standard deviation of the books on weekly sales given in Table 2.6
presented in Topic 2.

Solution

Actually, we need to include a new column for the product f x
2
as follows:

Class
Class
Mid-point (x)
Frequency
(f)
f x
(f multiplies x)
f x
2
(f multiplies x
2
)
34 - 43 38.5 2 77 2964.5
44 - 53 48.5 5 242.5 11761.25
54 - 63 58.5 12 702 41067
64 - 73 68.5 18 1233 84460.5
74 - 83 78.5 10 785 61622.5
84 - 93 88.5 2 177 15664.5
94 - 103 98.5 1 98.5 9702.25
Sum 50 3315 227242.5

The standard deviation is:

2
2
2
50
3315
50
5 . 227242
|
.
|
\
|
=
|
|
.
|
\
|
=
_ _
n
x f
n
x f
i i i i
o
= 12.21 12 books.

The variance is
2
o = 149.16 ~ 149 books.

Do you think obtaining standard deviation for grouped data is easier than
for ungrouped data? Justify your answer.

ACTIVITY 5.3

76
Coefficient of Variation

When we want to compare the dispersion of two data sets with different units, as
data for age and weight, variance is not appropriate to be used simply because this
quantity has a unit. However, the coefficient of variation, V as given below which
is dimensionless is more appropriate.

Standard Deviation
V
Mean
o
= =

The comparison is more meaningful, because we compare standard deviation
relative to their respective mean.

SKEWNESS
In a real situation we may have distribution which is symmetry such as in Figure
5.1, case (a) or negatively skewed such as in Figure 5.1, case (b) or even
positively skewed such as in Figure 5.1, Case (c). Sometimes we need to measure
the degree of skewness. For that, we will use the coefficient of skewness given in
the following section.

Coefficient of Skewness

Pearsons Coefficient of Skewness

For a skewed distribution, the mean tends to lie on the same side of the mode as
the longer tail [See Figure 5.1, case (b) & case (c)]. Thus, a measure of the
asymmetry is supplied by the difference (Mean Mode). We have the following
dimensionless coefficient of skewness:

5.5
EXERCISE 5.4
Referring to data Sets E, F and G in Exercise 5.2:
(a) Calculate the standard deviation and coefficient of variation; and
(b) Compare their data spread.
(5.9)
77
- Pearsons First Coefficient of skewness

PCS (1) =
( ) Mean Mode x x
Standard Deviation s

=

- Pearsons Second Coefficient of skewness
If we do not have the value of Mode then by using formula (4.6) in Topic 4,
we have the following second measure of skewness:

PCS (2) =
3( ) 3( ) Mean Median x x
Standard Deviation s

=

EXERCISE 5.5
Given the frequency table of two distributions as follows:

Distribution A:
Weight (Kg) 20-29 30-39 40-49 50-59 60-69 70-79 80-89
No. of Students 4 10 20 30 25 10 1

Distribution B:
Marks 20-29 30-39 40-49 50-59 60-69 70-79 80-89
No. of Students 10 20 30 20 15 4 1

Make a comparison of the above distributions based on the following
statistics:
(a) Obtain: mean, mode, median, Q
1
, Q
3
, standard deviation and the
coefficient of variation.
(b) Obtain the Pearsons coefficient of skewness and comment on the
values obtained.
(5.10)
(5.11)

78

- In this topic, you have studied various measures of dispersions which can be
used to describe the shape of a frequency curve.
- It has been mentioned earlier that overall range cannot explain the pattern of
observations lying between the minimum and the maximum values.
- Thus we introduce inter quartile range (IQR) to measure the dispersion of the
data in the middle 50% or the main body.
- The variance which is being called Shape Parameter is also given to measure
the dispersion.
- However, for comparison of two sets of data which have different units,
coefficient of variation is used.
- This coefficient is preferred because it is dimensionless.
- Finally, the Pearsons coefficient of skewness is given to measure the degree
of skewness of non-symmetric distribution.

Topic 5 Measures of Dispersion

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Topic 5 Measures of Dispersion

Enviado por

Direitos autorais:

Formatos disponíveis

X INTRODUCTION

Você também pode gostar