Você está na página 1de 27

Measures of Dispersion

Range
Quartile, interquartile range, semi-interquartile range
Variance
Standard deviation

Range
Range = Largest observation smallest observation

For example,

In U6A, the highest mark for Maths is 82


while the lowest is 26. The range is 56.
In U6B, the highest mark for Maths is 75
while the lowest is 38. The range is 37.

Quartile
Quartiles are values which divide a set of data

arranged in ascending or descending order into four


equal parts.
The first quartile Q1, or the lower quartile of
the total number of data has values less than Q1.
The second quartile is the median.
The third quartile Q3, or the upper quartile of

the total number of data has values less than Q3.

Quartile, Interquartile & Semi-interquartile


range for ungrouped data
To find quartile, arrange the data in ascending

order, as in the following example:


23, 47, 32, 34, 42, 35, 44, 36, 52, 40, 42, 46
We also have:
Interquartile range = Q3 Q1
Semi-interquartile range = (Q3 Q1)

Quartile, Interquartile range &


semi-interquartile range for grouped data
1

(
f
)

B
c
First quartile, Q1 = (f)th observation = LB 4
f

where LB = lower class boundary of the class containing


the first quartile,
FB = cumulative frequency before the class containing the
first quartile,
f = frequency of the class containing the first quartile,
(i) Interpolation method

c = width of the class containing the first quartile

Quartile, Interquartile range &


semi-interquartile range for grouped data
3

( f ) FB
c
Third quartile, Q3 = (f)th observation= LB 4
f

where LB = lower class boundary of the class containing


(i) Interpolation method

the third quartile,


FB = cumulative frequency before the class containing the
third quartile,
f = frequency of the class containing the third quartile,
c = width of the class containing the third quartile

Quartile, percentile
Beside quartile, we can also talk about

percentile.
For example we can talk about the 15 th

percentile.

Variance
Consider the data:
3, 4, 5, 6, 7.
Mean
x = (3 + 4 + 5 + 6 + 7)/5
=5
We wish to study how are the data deviate from the
mean. So we find (xi - x ) for each of the data.
Unfortunately,

( xi x) is always zero for any set of

data.
To overcome this problem, we use the squares of
these value.
2
(
x

x
)

The result:
i
variance, s 2
n

Variance
n

Prove that
Proof:

( xi x) x n x
2

i 1

( xi x)
i 1

i 1
n

2
i

( x 2 xxi x )
2
i

i 1
n

i 1

i 1

x 2 x xi x
i 1
n

2
i

x 2 x(n x) n x
i 1
n

2
i

2
i

x nx
i 1

2
n

x 2n x n x
i 1
n

2
i

xi
i 1

n
n

n x xi
i 1

Variance for ungrouped data


For ungrouped data, the variance:
n

s2

2
(
x

x
)
i
i 1

n
n

s
2

2
x
i
i 1

s2

2
x
i
i 1

i 1

Variance for grouped data


For grouped data, the mid-point of each class, xi, is

used to represent the class.


So, the variance is given by:
s2

f ( xi x) 2
f

2
fxi2

s
x
f
fxi2 fx

s

f f

Prove this.

Standard Deviation
In the process of finding variance, we have

squared the data. This means that variance is


one dimension more than the data.
For example, unit for the data: cm;

unit for variance: cm2.


So variance is not a very useful measure.
Instead, we take its square root and call it

standard deviation.

Standard Deviation for ungrouped data


For ungrouped data, the standard deviation is:
n

( xi x) 2
i 1

n
n

xi2
i 1

2
x
i
i 1

i 1

Standard Deviation for grouped data


For grouped data, the standard deviation is:

f ( xi x) 2
f

2
2
fx

i
s
x
f

fx
f

2
i

fx
f

Standard deviation & variance by


coding method
Similar to the coding method for calculating mean.

xk
y
h

where k is the assumed mean,


and h is the scaling factor.

Standard deviation of y:
sy

f ( y y)
f

s y2

1
2
2
h
sy

xk xk

h
h

f x x
f

1 2
s 2 sx
h
2
y

s x hs y

Symmetry & skewness of data


distribution
(a) symmetrical distribution (Bell shaped)

Mean = median = mode


This is also known as the normal distribution.

Symmetry & skewness of data


distribution
(b) positively skewed distribution (skewed to

the right)

mode median mean

This mean is greater than the mode.

Symmetry & skewness of data


distribution
(c) negatively skewed distribution (skewed to

the left)

mean median mode

This mean is less than the mode.

Box-and-whisker plots (Boxplots)


This is another graphical representation of

data.
(a) Horizontal box-and-whisker plot:
Lowest value

Lower
Median Q2
quartile Q1

Highest value
Upper
quartile Q3

Box-and-whisker plots (Boxplots)


(b) Vertical box-and-whisker plot
Highest value
Upper
quartile Q3
Median Q2
Lower
quartile Q1

Lowest value

The box extends from Q1 to Q3 and


encloses the middle 50% of the data.
The whiskers extend from the box to
the lowest and highest values and
illustrate the range of the data.

Comparison between frequency curves


and boxplots
(a) symmetrical distribution

Q1 Q2 Q3

The left and the right whiskers have equal lengths and
the median lies in the middle of the box.

Comparison between frequency curves


and boxplots
(b) positively skewed distribution

Q1 Q2 Q3

The left whisker is shorter than the right whisker and


the median lies closer to the lower quartile.

Comparison between frequency curves


and boxplots
(c) negatively skewed distribution

Q1 Q2 Q3

The left whisker is longer than the right whisker and


the median lies closer to the upper quartile.

Example of boxplot
The stem-and leaf plot below shows the number of flies

caught in an insect trap for 28 days.


0
1
1
2
1
2
3
5
5
5
6
2
2
2
3
5
8
8
3
4
4
4
4
5
7
7
8
4
2
6
7
7
8
key: 1 | 2 means 12 flies
(a) Illustrate the data by drawing a boxplot.
(b) Use your boxplot to comment on the type of distribution.

Example of boxplot
(a) From the data in the stemplot, the lowest

value is 1, the lower quartile = 15, the median


= 28, the upper quartile = 37, and the highest
value = 48.
1
0

15
10

28
20

30

48

37
40

50

(b) The left whisker is longer than the right

whisker & the median lies closer to the upper


quartile. Therefore, the distribution is
negatively skewed.

Using the boxplot to eliminate outliers


Sometimes extreme values (values that are too small

or too large) appear in a set of data. These extreme


values are called outliers.
Data less than 1 times the interquartile range below
Q1 and more than 1 times the interquartile range
above Q3
are known as outliers.
1.5(Q3-Q1)

1.5(Q3-Q1)
outliers

outliers
Lower
boundary

Q1

Q2

Q3

Upper
boundary

Outliers - example
Grades of 48 students for a certain subject:

Grade

No of students

13

Median, Q2 = 3
Lower quartile, Q1 = 2
Upper quartile, Q3 = 4.5
Lower boundary = Q1 1.5(Q3 Q1) = 2 1.5(4.5 2)

= - 1.75

Upper boundary = Q3 + 1.5(Q3 Q1)

= 4.5 + 1.5(4.5 2) = 8.25


So the outlier is 9.
The whisker is drawn from 1 to 8.

-2

-1

Further
example

Você também pode gostar