Escolar Documentos
Profissional Documentos
Cultura Documentos
STATISTICS
BFC 34303
Chapter 1 :
Review on Descriptive Statistics
INTRODUCTION
These are Mathematics marks
for 30 students who are taking
Test 1
WHAT IS STATISTIC?
~ Statistic is a numerical measurement
describing some characteristics of a
sample
~ Eg: The sample mean ,variance
WHAT IS VARIABLE ?
~ Any measured characteristic or
attribute that differs for different
elements
~ Example:
(i) 3,5,6,2,5,2,4,6,5
MEASURES OF LOCATION
PERCENTILE QUARTILE
MEASURES OF LOCATION
( CENTRAL TENDENCY)
MEAN
Given a set data of x1,x2,x3,..xn.
The mean, is defined as
sum of all observations
x
number of observations
x1 x 2 ... x n
n
n For a set of data k
x i which can be fx i i
i1 represented in a
i1
n frequency distribution k
table, the mean is
given by
f
i 1
i
CENTRAL TENDENCY
In general terms, central tendency
(mean, median, and mode) is a statistical
measure that determines a single value
that accurately describes the center of the
distribution and represents the entire
distribution of scores.
x i
1 4 2 ... 3 1 2
x i 1
n 20
41
2.05
20
OR
x 0 1 2 3 4 5
f 2 5 7 3 2 1
f x i i
2(0) 5(1) 7(2) 3(3) 2(4) 1(5)
x i1
k 20
f
i 1
i 2.05
Exampl
e : To obtain grade A, Saleha must achieve an average
of at least 75 marks in four tests. If her average
mark for the first three tests is 70, calculate the
lowest mark she must get in her fourth test in order
to obtain grade A.
Solution:
Let the four tests : w,x,y,z
Mean for w,x,y : 70
Mean for w,x,y,z :
3(70) z
75
4
210 z
75
4 So, the lowest mark
210 z 300 she must get in her
fourth test in order to
z 90 obtain grade A is 90
MEDIAN
The median is the middle value of a set of data that is arranged in
order of magnitude.
th
Let x(k) be the k observation in a set of data which has been
arranged in ascending or descending order.
For example, consider the following set of numbers
9 2 7 10 5 16
After arrangement, it becomes
2 5 7 9 10 16
Thus,
between x 3 7 and x 4 9
median is 8
Themedianof a set data x1 ,x 2 ,...,x n is denoted
by x(m) and x m may becalculated as:
x n1 ,if n is odd
2
xm
1
x x
2 2
n n
,if n is even
1
2
Exampl
e : Find the median for the following sets of data
a) 21, 24, 17, 28, 36, 20, 32
b) 3.56, 2.7, 5.48, 8.61, 4.35, 6.22
Solution:
a) The data arranged in ascending order :
17 , 20 , 21 , 24 , 28 , 32 , 36
Since n = 7 , which is odd, thus the
median is x x x 24
m n 1 4
2
b) The data arranged in ascending order :
2.71 , 3.56 , 4.35 , 5.48 , 6.22 , 8.61
Since n = 6 , which is even, thus the
median is
1
xm x 6
x 6
2 2
2
1
1
x 3 x 4
2
1
4.35 5.48
2
4.915
MODE
a) 2, 3, 3, 4, 5, 28, 5, 5
b) 2, 3, 5, 8, 10
2
(ii) If r is not an integer, then round up to the next
integer.
Q2 is also called median.
Interquartile Range = Q3 Q1
PERCENTILES
Percentiles divide a set of data which are arranged in
ascending order into 100 equal parts.
To find percentile ( Pk ):
k
Let r n
100
where : n number of observations
k percentile for Pk
(i) If r is an integer:
1 th
Pk r observation ( r 1)th observation
2
(ii) If r is not an integer, then round up to the next
integer.
Third quartile Q3
k 3
r n 7 5.25 ( not an integer )
4 4
Q3 6 observation 32
th
40 percentile P40
th
k 40
r n 7 2.8 ( not an integer )
100 100
P40 3 observation 21
rd
Example :
MEASURES OF DISPERSION
RANGE
Exampl
e:
Data 1: 6,7,8,6,9,6 mean = 7
Data 2: 5,7,2,6,13,9 mean = 7
Variability
The goal for variability is to obtain a
measure of how spread out the
scores are in a distribution.
A measure of variability usually
accompanies a measure of central
tendency as basic descriptive
statistics for a set of scores.
MEASURES OF DISPERSION
REMARK
Range is not a good measure of dispersion because it
is influenced by the extreme values and the
calculation does not cover all observations.
nfi
S 2
(X X) i
2
n 1 for i 1,2,...,n
Commonly in use formulae
STANDARD
DEVIATION
2
x
2
2
nX fx 2
nX
S
2 i S 2
i i
n 1 n 1
S VARIANCE
x
2
fx
2
x 2
i
i
fx 2
i i S 2
n i i
n
n 1 n 1
Exampl
e:
Calculate the variance and standard deviation for the
following sets of sample data. Hence, determine which data
is more disperse about the mean.
Set 1 : 16,10,9,2,5,2,7
Set 2 : 10,32,8,12,14,36,20,8,40,4,32,1
For Data 1:
Data 1 : 16,10,9,2,5,2,7
n
2
x x2 n x i
i 1
2 4
i 1
Xi
2
n
2 4
5 25
S
2
7 49 n 1
9 81
51
2
10 100 519
7 24.571849
16 256
6
n n
Xi 51
i1
i 519
X
i1
2
S 24.571849 4.957
For Data 2:
Data 2 : 10,32,8,12,14,36,20,8,40,4,32,1
n
2
n n
n x
Xi 217 i 5929
2
i X
i 1
i 1
Xi
2
n
i1 i1
S
2
n 1
217
2
5929
12 182.265 Hence, data 2 is
11 more disperse
than data 1
S 182.265 13.5
STEM-AND-LEAF DIAGRAMS
Used to extract every data value in dataset.
theleaves.
To construct a stem-and-leaf diagram:
The distribution shows that most data are clustered at the right.
The left tail extends farther from the data centre than the right
tail. Therefore, the distribution is skewed to the left or
negatively skewed.
Exampl
e:
Marks of a recent Mathematics test are as given below:
73, 42, 67, 78, 99, 84, 91, 82, 86, 94
Based on the marks given:
(a)Construct astem-and-leaf diagram.
(b)What is the highest and lowest mark?
(c)Interpret the distribution.
Solution:
(a) Mathematics Test Mark
Stem Leaf
4 2
5
6 7
7 3 8
8 2 4 6
9 1 4 9
Key:
9 9 means 99 marks
(b) Highest mark = 99, Lowest mark = 42
(c) Negatively skewed
Exampl
e:
Given the heights of 20 people are as follows:
154, 143, 148, 139, 143, 147, 153,
162, 136, 147, 144, 143, 139, 142,
143, 156, 151, 164, 157, 149.
Construct a stem-and-leaf diagram and state the
shortest and
tallest height. Interpret the distribution.
Solution:
Stem Leaf
13 6 9 9
14 2 3 3 3 3 4 7 7
8 9
15 1 3 4 6 7
16 2 4 Key:
13 6 means 136 cm
Shortest height =136 cm
Tallest height =164cm
Positively skewed
Exercise:
Q1 Q2 Q3 60
min max
50
0 10 20 30 40 50 60 70
40 Q3
Horizontal Box and Whisker
30
Q2
20 Q1
10
min
Vertical Box and Whisker
0
BOX-AND-WHISKER PLOTS
To construct a box-and-whisker plot:
min max
Q1 Q2 Q3
10 20 30 40 50 60 70 80 90 100
The data lies within the upper and lower inner fence, so the data has no outlier.
min max
Q1 Q2 Q3
10 20 30 40 50 60 70 80 90 100
Q1 Q2 Q3
min max
SHAPE OF DATA DISTRIBUTION
(SYMMETRY AND SKEWNESS)
Q1 Q2 Q3
min max
SHAPE OF DATA DISTRIBUTION
(SYMMETRY AND SKEWNESS)
Q1 Q2 Q3
min max
Exampl
e ::
Data
40, 32, 61, 52, 65, 68, 41, 61, 70, 66, 57, 55, 45,
51, 62, 69, 31, 50, 72, 66, 41, 54, 65, 79, 66
(a) Display the data in a stem and leaf diagram.
(b) Find the first, second and third quartiles, upper and lower inner
fence.
(c) Construct a box and whisker plot for the above data.
Solution :
(a) Stem Leaf
3 1 2
4 0 1 1 5
5 0 1 2 4 5 7
6 1 1 2 5 5 6 6 6 8 9
7 0 2 9
Key:
5 4 means 54
(b) Number of observation, n = 25, min = 31 , max = 79
1
r 25 6.25 , Q1 = the 7th observation
4
= 50
2
r 25 12.5 , Q2 = the 13th observation
4
= 61
3
r 25 18.75, Q3 = the 19th observation
4
= 66
31 50 61 66 79
10 20 30 40 50 60 70 80 90 100
Key:
5 9 means 59o F
1
r 23 5.75 Q1 = the 6th observation
4
= 64o F
2 Q2 = the 12th observation
r 23 11.5
4
= 68o F
3
r 23 17.25 Q3 = the 18th observation
4
= 70o F
Upper inner fence = Q3 + 1.5 (Q3 Q1)
= 70 + 1.5(70-64)
= 79o F
51 64 68 70 77
50 60 70 80
From the boxplot, we can see that the minimum value
51o F is outside the fence and this value is the outlier.
Therefore whiskers is drawn from 59o F to 77o F .
Lower inner fence Upper inner fence
55 79
Q1 Q2 Q3
Outlier
51 59 77
64 68 70
50 60 70 80
The data is negatively skewed (skewed to the left).
GROUPED
DATA
MEAN MODE MEDIAN
MEASURES OF LOCATION
PERCENTILEQUARTILE DECILE
MEAN of a frequency distribution
f i xi
x i 1
k
f
i 1
i
f
i 1
i total no. of frequency
x i class mark
Exampl
e:
Find the mean for the following data
Class Frequency, fi
0 x <10 2
10 x <20 17
20 x <30 26
30 x <40 10
40 x <50 5
Class Frequency
0 x <10 2
10 x <20 17
20 x <30 26
30 x <40 10
40 x <50 5
0 10
SOLUTION: x
2
Class Class mark, Frequency, fixi
xi fi
0 x <10 5 2 10
10 x <20 15 17 255
20 x <30 25 26 650
30 x <40 35 10 350
40 x <50 45 5 225
fi = 60 f x
i i 1490
k
f i xi 1490
x i 1
k x 24.83
f i 60
i 1
MODE of a frequency distribution
d1
mod e Lm c
d1 d 2
Lm = lower boundary of the class containing the
mode
d1 = the diff. between the frequency of the mode
class and the frequency of the class
immediately before it.
d2 = the diff. between the frequency of the mode
class and the frequency of the class
immediately after it
C = size of the mode class
Exampl
e the
Find : mode of frequency distribution given below:
Class Frequency
15 - 19 1
20 - 24 4
25 - 29 22
30 - 34 35
35 - 39 20
40 - 44 8
SOLUTION:
Lm 29.5
d1
d1 35 22 mod e Lm c
d 2 35 20 d1 d 2
c5
13
mod e 29.5 5
13 15
= 31.8
Mode from histogram
Draw a line from the left upper
Draw
cornera of
line from
the the right
highest upper
vertical bar
frequency corner ofestimated
the highest vertical
to the is
Mode left upper corner
from of
thethe bar
to thevertical
next right upper
intersection bar corner
point of bothof the
lines
vertical bar before it
Histogram should be drawn on a
graph paper in order to obtain an
accurate answer
Frequenc
35
30
25
y
20
15
10
5
NOTE :
n
2 FL
m Lm c
fm
L m lower boundary
n total no. of frequency
FL cumulative frequency of the class before median class
fm frequency of median class
c size of median class
Exampl
e : the median for the following data
Calculate
Class Frequency, f
0x<5 7
5 x <10 27
10 x <15 35
15 x < 20 54
20 x < 25 63
25 x < 30 43
30 x < 35 25
35 x < 40 17
40 x < 45 9
45 x < 50 4
SOLUTION:
Class Frequency, f Frequency, FL
0x<5 7 7
5 x <10 27 34
10 x <15 35 69
15 x < 20 54 123
20 x < 25 63 186
25 x < 30 43 229
30 x < 35 25 254
35 x < 40 17 271
40 x < 45 9 280
45 x < 50 4 284
f 284
The median class is 20 x < 25 with the
corresponding frequency as 63.
Hence, the median is n
2 FL
m Lm
Lm 20 fm
c
f 284 1
FL 123 2 (284) 123
m 20 5
63
fm 63
c5 21.51
Quartile
Quartiles divide a set of data which are
arranged in ascending order into 4 equal
parts
Percentile
Percentiles divide a set of data which are
arranged in ascending order into 100 equal
parts
Decile
Deciles divide a set of data which are
arranged in ascending order into 10 equal
parts
For grouped data;
k
4 n FL
Qk L k C k, k 1, 2, 3,..
fk
k
100 n FL
Pk L k C k, k 1, 2,3,..,99
fk
k
10 n FL
Dk L k Ck, k 1, 2,3,..,9
fk
Lk = lower boundry of the class where Q k ,Pk ,Dk lies
n = total number of observations
FL = cumulative frequency before the class Qk ,Pk ,Dk
fk = frequency of the class where Q k ,Pk ,Dk lies
ck = class width where Qk ,Pk ,Dk lies
Exampl
e:
Height (cm) 3-5 6-8 9-11 12-14 15-17 18-20
Frequency 1 2 11 10 5 1
7.5 3
= 8.5 + 3 9.73
11
Q3 is in third class with boundries (11.5-14.5 )
Thus, L k 11.5, f k 10, FL 14, c=3
Q3 = P75
22.5-14
=11.5 + 3
10
14.05
MEASURES OF DISPERSION
INTERQUARTILE RANGE
Defined as the difference
between the third quartile and
the first quartile
Interquartile range = Q3 - Q1
Variance and standard deviation
fx
2
fx
2
Variance, S2
f
f -1
f 21 fx fx 2
= 204 2676
Solution:
Range = upper boundary of the last data
- lower boundary of the first class
= 18.5 0.5 = 18
fx
2
fx
2
S 2
f S 34.71
2
f 1
204
2 S = 34.71
2676
21
20 5.892
Exampl
e :the mean, variance and standard
Find
deviation.