Chapter 1: Descriptive Statistics: 1.1 Some Terms

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II
Chapter 1: Descriptive Statistics

1.1
Some terms
Raw data
Raw data is data recorded in the sequence in which they are collected and before they are processed or
ranked
Table 1: The weights of 20 students in kg (Quantitative raw data)
61
66
68
65
65
62
67
67
68
60
71
73
69
69
63
70
74
70
64
71
B
D
C
B
Table 2: The grades of UCCM2623 of 20 students (Qualitative raw data)

A
B
B
A
C
B
A
B
C
B
B
A
B
C
A
D
Arrays
An arrangement of numerical raw data in ascending order or descending order of magnitude
60
68
61
68
62
69
63
69
64
70
65
70
65
71
66
71
67
73
67
74
Ungrouped data
Contains information on each member of a sample or population individually
Examples: Data presented in Table 1 and Table 2
Grouped data
Data presented in classes or intervals.
Example:
UCCM2623 Scores
Number of students
1.2
10 12
4
13 15
12
16 18
20
19 21
14
Organizing and Graphing Qualitative Data
1.2.1 Frequency distributions for qualitative data

A tabular arrangement that lists all categories and the number of elements that belong to each of the
categories.
Example 1.1. A sample was taken of 25 students who were planning to go to college. The courses he/she
intended to choose:
Engineering Infotech
Engineering Business
Business
Business
Business
Other
Biotech
Biotech
Biotech
Biotech
Infotech
Biotech
Biotech
Other
Business
Engineering Business
Other
Engineering Biotech
Biotech
Other
Infotech
Construct a frequency distribution table for these data.
Chapter 1 - 1
Solution.
Course
Biotech
Business
Engineering
Infotech
Others
Tally
Frequency
8
4
4
25
Total:
1.2.2 Relative frequency and percentage distributions

Tabular arrangement that lists the relative frequencies and percentages for all categories.
relative frequency of a category =
frequency of that category

f
=
sum of all frequencies
f
Percentage = relative frequency 100%
Example 1.2. Determine the relative frequency and percentage distributions for the data in Example 1.1.
Solution.
Course
Relative
Frequency
Biotech
Business
Engineering
Infotech
Others
Percentage
32%
0.24
16%
0.12
Total:
16%
100%
1.2.3 Graphical presentation of qualitative data

Bar Graphs (bar chart)
A graph made of bars whose heights represent the frequencies of respective categories.
Example 1.3. Construct a bar chart for the data in Example 1.1.
Solution.
Frequency
8
6
4
2
Biotech
Business Engineering Infotech

Chapter 1 - 2
Others
Course
1.3
Organizing and graphing quantitative data
1.3.1 Frequency Distribution for quantitative data
Lists all the classes and the number of values that belong to each class.
Data presented in the form of a frequency distribution are called grouped data.
Note:
Generally, the grouping process destroys some of the original information
The classes are non-overlapping i.e. each value belongs to one and only one class
Class
An interval that includes all the values that falls within two numbers, the lower and upper limits
Class limits
Endpoints of each interval
Class Boundary
Class boundary is the dividing line between two classes. It is given by the midpoint of the upper limit of
one class and the lower limit of the next higher class
Class width / class size
Class width is the difference between the upper and lower class boundary
class width = upper boundary lower boundary
Class mark / class midpoint
Class mark is the midpoint of the class interval
class mark = (lower class limit + upper class limit ) / 2
Constructing frequency distribution tables
1.
Determine the number of classes, usually varies from 5 to 20, depending mainly on the number of
observations in the data set.
Find 2k where k is the smallest number such that 2k is greater than the number of observations
(n).
2.
Determine the class interval or width ( i )

Must cover at least the distance from the smallest value (L) in the raw data up to the largest value
(H)
largest value( H ) smallest value( L)
approximate class width =
number of classes
3.
The class width is usually rounded to some convenient number.

The rounding of this number may slightly change the number of classes initially intended.
Determine the lower limit of the first class or the starting point.
Any convenient number that is equal to or less than the smallest value in the data set can be used
as the lower limit of the first class.
Chapter 1 - 3
Example 1.4. Sample of birth-weights (oz) from 50 consecutive deliveries is given below. Construct a
frequency distribution table.
86
120
123
104
121
111
91
128
133
104
118
89
134
132
98
121
122
115
106
115
92
115
84
98
107
124
138
138
125
127
108
118
140
146
122
104
99
105
108
135
132
95
124
132
126
125
115
144
98
89
Solution.
Birthweights (oz)
80-89
90-99
Tally
f
4
8
110-119
120-129
130-139
13
3
1.3.2 Relative frequency and percentage distributions

relative frequency of a class =
frequency of that class

f
=
sum of all frequencies f
Percentage = relative frequency 100%
Example 1.5. Calculate the relative frequencies and percentages distributions for the data in Example
1.4.
Solution.
Birthweights (oz)
Class Boundaries
80-89
79.5 - 89.5
90-99
100-109
110-119
120-129
119.5 - 129.5
130-139
129.5 - 139.5
140-149
139.5 - 149.5
Relative Frequency
Percentage
8%
0.14
0.16
0.14
0.26
89.5 - 109.5
14%
16%
14%
16%
0.06
Chapter 1 - 4
6%
Grouped (quantitative) data can be displayed in a histogram or a polygon.
1.3.3 Histogram
Three types of histogram
1.
Frequency histogram
Relative frequency histogram
2.
3.
Percentage histogram
A frequency histogram consists of a set of rectangle having
a) The bases on a horizontal axis with centres at the class marks and lengths equal to the class interval
sizes
b) The areas proportional to the class frequencies
If the class intervals all have equal size
the height of the rectangles are proportional to the class frequencies
otherwise
the height of the rectangles must be adjusted
Procedures to draw a histogram:
1.
Mark the class boundary of each interval on the horizontal axis.
2.
For each class, mark the frequencies (or relative frequencies or percentages) on the vertical
axis.
Draw a bar for each class so that its height represents the frequency of that class. (No gap
3.
between each bars)
4.
Label the histogram.
1.3.4 Polygon
Polygon is a line graph formed by joining the midpoints of the tops of successive bars in a histogram.
Next, we mark two more classes (with zero frequencies), one at each end, and mark the midpoints.
Three types of polygon:
1.
Frequency polygon
2.
Relative frequency polygon
3.
Percentage polygon
Example 1.6. Reconsider the data in Example 1.4 and draw

i)
the frequency histogram and frequency polygon
ii)
the relative frequency histogram and relative frequency polygon
iii)
the percentage histogram and percentage polygon
Chapter 1 - 5
The frequency histogram and frequency polygon

Frequency
15
10
79.5
89.5
99.5
109.5
119.5
129.5
139.5
149.5
Birth-weight (oz)
The relative frequency histogram and relative frequency polygon

Relative Frequency
0.30
0.25
0.20
0.15
0.10
0.05
79.5
89.5
99.5
109.5
119.5
129.5
139.5
149.5
Birth-weight (oz)
The percentage histogram and percentage polygon

Percentage Relative Frequency
30
25
20
15
10
5
79.5
89.5
99.5
109.5
119.5
129.5
139.5
149.5
Birth-weight (oz)
Example 1.7. The frequency distribution gives the weight of 35 objects, measured to the nearest kg.
Draw a histogram to illustrate the data.
Weight (kg)
Frequency
68
4
Solution.
adjusted frequency =
9 11
6
12 17
10
18 20
3
standard class width

frequency
class width
Chapter 1 - 6
21 29
12
Weight (kg)
68
9 11
Class width
3
Frequency
4
6
12 17
Height of rectangle (adjusted frequency)

4
6
10
18 20
21 29
12
Adjusted Frequency
6
5
4
3
2
1
5.5
8.5
11.5
14.5
17.5
20.5
23.5
26.5
29.5
Weight (kg)
1.3.5 Cumulative frequency distribution

A table that presents the total number of values that fall below the upper boundary of each class.
It is constructed for quantitative data only.
cumulative frequency of a class
cumulative relative frequency =
sum of all frequencies in the data set
cumulative percentage = cumulative relative frequency 100%
Example 1.8. Refer to data in Example 1.4, construct its cumulative frequency distribution, cumulative
relative frequency and cumulative percentage.
Birthweights (oz)
<79.5
Cumulative
frequency
0
4
<99.5
<109.5
19
<119.5
<129.5
<139.5
<149.5
26
39
47
55
Cumulative relative
frequency
0
0.08
0.22
Cumulative
percentage, %
0%
8%
22%
38%
0.52
0.78
0.94
1
78%
94%
100%
1.3.6 Ogive / Cumulative frequency curve

A curve drawn for the cumulative frequency distribution by joining the dots marked above the upper
boundaries of classes at heights equal to the cumulative frequencies of respective classes.
Chapter 1 - 7
Note:
1.
The ogive starts at the lower boundary of the first class and ends at the upper boundary of the last
class.
2.
If relative cumulative frequency is used in place of cumulative frequency, the graph is called
relative cumulative frequency curve or percentage ogive.
Example 1.9. Draw an ogive for the data in Example 1.4. Estimate from the ogive,
a)
the total number of deliveries that their birth-weights were less than 95oz.
b)
the value of X , if 20 % of the deliveries were of birth-weights X oz or more.
Cumulative frequency
Solution.
Ogive
55
50
45
40
35
30
25
20
15
10
5
0
79.5
89.5
99.5
109.5
119.5
129.5
139.5
149.5
Birth-Weight (oz)
1.4
Measures of central tendency

Represent a data set by some numerical measures (typical values).
A single value that summarizes a set of data.
It locates the centre of the values.
Give the centre of a histogram or a frequency distribution curve.
3 measures will be considered here:

1.
Median
2.
Mode
3.
Mean
1.4.1 Median
Median is the value of the middle term in a data set that has been ranked in increasing or decreasing order
Median is the value of the
n +1
th term in a ranked data set; n = total number of elements in the set .
2
Note:
1.
If n is odd, then median is the value of the middle term in the ranked data.
2.
If n is even, then median is the average value of the two middle terms.
Chapter 1 - 8
Example 1.10. Find the median of set A = { 10, 5, 19, 8, 3 } and set B = { 2, 7, 3, 6, 4, 5 }
Solution.
Note:
Median is not influenced by the extreme value. (Extreme values are values that are very small or very
large relative to the majority of the values in a data set.)
For grouped data in the form of frequency distribution of single-valued classes
Median can be found either from ungrouped frequency distribution or from the cumulative frequency
distribution.
Example 1.11. Find the median of the following frequency distribution.

No. of children
Frequency
0
3
1
5
2
12
3
9
4
4
5
2
Solution.
1.4.2 Mode
Mode is the value that occurs with the highest frequency in a data set.
Example 1.12. Find the mode of each of the following data set.
i)
74, 9, 5, 8, 3, 8, 8
iii)
2, 6, 6, 6, 3, 8, 8, 8, 3
ii)
2, 2, 6, 6, 8, 8, 9, 9
iv)
B, C, D, A, A, C, C, C, B, A
Solution.
Note:
1.
Mode is not influenced by the extreme value.
2.
Mode may not exist, exist one mode(unimode), two modes(bimodal) or more than two
modes(multimodal).
3.
Mode can be used for both quantitative and qualitative data
Chapter 1 - 9
Example 1.13. Find the mode of the following frequency distribution.

No. of children
Frequency
0
3
1
5
2
12
3
9
4
4
Solution.
1.4.3 Mean
The mean for population data x1 , x 2 , ..., x N is denoted by and is defined as
x + x + ... + x N
1 N
= 1 2
=
xi
N
N i =1
The mean for sample data x1 , x 2 , ..., x n is denoted by
X =
x1 + x 2 + ... + x n 1
=
n
n
n
i =1
X and is defined as
xi
Example 1.14. Find the arithmetic mean for the data set { 158, 189, 265, 127, 191 }
Solution.
Note:
1.
Mean not necessary takes one of the values in the original data
2.
Mean is influenced by extreme value
For grouped data in the form of frequency distribution of single-valued classes
X =
f 1 x1 + f 2 x 2 + ... + f n x n 1
=
n
n
n
i =1
f i xi =
f i x i
f i
Example 1.15. Find the mean of the following frequency distribution.
fi
2
1
5
3
6
4
8
2
xi
fi
f i xi
24
16
xi
Solution.
Chapter 1 - 10
5
2
For grouped data in the form of frequency distribution

Suppose data are grouped into k class intervals, and
f i = the frequency of class i
mi = the midpoint of class i
f i =population size
N=
n=
mean for population data:
f i mi
N
mean for sample data:
X=
f i mi
n
f i = sample size
Example 1.16. Find the mean of the following frequency distribution.

Weight (kg)
Frequency
68
4
Solution.
Class interval
9 11
6
68
21 29
12
12 17
18 20
21 29
10
14.5
19
25
6
60
10
145
3
57
12
300
f i mi
1.5
18 20
3
9 11
Class midpoint ( mi )
Frequency ( f i )
12 17
10
Measures of dispersion
Sometimes, with the measures of central tendency only are not enough to reveal the whole picture of the
distribution of a data set. This is because the measure of central tendency does not describe how the data
is distributed
Data set
A
B
Data
1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11
4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8
Set A
Mean
6
6
Median
6
6
Set B

1 2 3 4 5 6 7 8 9 10 11

4 5 6
Mode
6
6

7 8
Note: The mean, median and mode are the same for data set A and B but the distribution of the data are
different.
1.5.1 Measures of dispersion for ungrouped data

Range
The range for a data set {x1 , x 2 , ..., x n } is defined to be the difference between the largest value and
smallest value.
Range = largest value smallest value
Chapter 1 - 11
Example 1.17. Find the range for data set A and data set B above.
Variance
The variance is the average of the squared deviation of the data from the mean.
Consider a population of N measurements x1 , x 2 , ..., x N

Population Mean = =
1
N
N
i =1
Population Variance = 2 =
xi
1
N
N
i =1
( xi ) 2 =
1 N 2
( xi ) 2
N i =1
Consider a sample of n measurements x1 , x 2 , ..., x n

Sample Mean = X =
1
n
n
i =1
xi
1 n
1
( xi X ) 2 =
Sample Variance = s =
n 1 i =1
n 1
2
n
i =1
1
x
n
2
i
n
i =1
xi
Standard Deviation
The standard deviation is the positive square root of the variance
Sample standard deviation = s = s 2

Population standard deviation = = 2
Note: 1. A small standard deviation means that the data are distributed closely to their mean.
2. A large standard deviation means that the data are widely scattered about their mean.
3. It is influenced by extreme values.
Example 1.18. Data shows the salary per day for all 6 employees of a small company.
29.50, 16.50, 35.40, 21.30, 49.70, 24.60
Calculate the variance and standard deviation for these data.
Solution.
Mean, =
xi
xi
( xi ) 2
29.50
0.00
0.00
xi
870.25
5.90
- 8.20
20.20
- 4.90
34.81
67.24
408.04
24.01
1253.16
453.69
2470.09
605.16
16.50
35.40
21.30
49.70
24.60
Total
Chapter 1 - 12
Method 1:
Population variance = 2 =
1
N
N
i =1
( xi ) 2
Population standard deviation = =

Method 2:
xi2 =
Population variance = 2 =
1 N 2
( xi ) 2
N i =1
Population standard deviation = =
Example 1.19. A sample consists of 5 data values: 72, 49, 79, 55 and 57. Calculate the variance and
standard deviation.
Solution.
n = 5 , xi =
xi2 =
1
Sample variance = s =
n 1
i =1
1
x
n
2
i
n
i =1
xi
Sample standard deviation = s =
1.5.2 Measures of dispersion for grouped data

Variance
Population Variance = 2 =
Sample Variance = s 2 =
1
N
N
i =1
f i ( mi ) 2 =
1
1
f i ( mi X ) 2 =
n 1 i =1
n 1
n
f i mi2
f i mi
N
N
n
i =1
f i mi2
1
n
n
i =1
f i mi
Example 1.20. Find the variance from the following frequency distribution if it represent
a)
population
b)
sample
Height (m)
Frequency
20 22
3
23 25
6
26 28
12
Chapter 1 - 13
29 31
9
32 34
2
Solution.
Height
Midpoint, m
Frequency, f
fm
f m2
63
1323
6
12
9
2
324
270
66
8748
8100
2178
20 22
23 25
26 28
29 31
32 34
Total:
24
27
30
33
2 =
f i mi2
f i mi
N
N
s2 =
1
n 1
1.6
n
i =1
f i mi2
=
1
n
n
i =1
f i mi
Measures of position
Measures of position determine the position of a single value in relation to other values in a sample or a
population data set.
1.6.1 Quartiles
Quartiles are 3 summary measures that divide a ranked data set into 4 equal parts.
second quartile (Q2) is the median of a data set.
first quartile (Q1) is the value of the middle term among the observations that are less than
the median.
third quartile (Q3) is the value of the middle term among the observations that are greater
than the median.
To Find The Quartiles of Ungrouped Data

Consider n items arranged in ascending order. Then,
1
( n + 1) th
4
The first quartile = Lower quartile = Q1 =

The second quartile = Median = Q2 =
The third quartile = Upper quartile =
value
1
( n + 1) th value
2
3
Q3 = ( n + 1)th value
4
When n is odd, the rule locate the exact position of the quartiles.
When n is even,
a)
When n is even and
n
2
is even, then round all decimal values of
into .5 value , for example: 2.25

6.75
2.5
6.5
Chapter 1 - 14
1
3
( n + 1) or ( n + 1) values,
4
4
b)
1
3
( n + 1) or ( n + 1)
4
4
value which is greater than .5 value and round down the values which is smaller than .5 value, for
example:
3.75
4
2
2.25
When n is even and
n
2
is odd, then round up the decimal value of the
To Find The Quartiles of Grouped Data (from Ogive)

n
The first quartile = Lower quartile = Q1 = th value
4
n
The second quartile = Median = Q2 = th value
2
3n
The third quartile = Upper quartile = Q3 =
th value
4
1.6.2 Interquartile Range(IQR)

Interquartile Range, IQR = Q3 Q1
The semi-interquartile range = The quartile deviation =
Q3 Q1
2
1.6.3 Percentiles
The (approximate) value of the kth percentile, denoted by Pk is
Pk = value of the
kn
th term in a ranked data set
100
where k denotes the number of the percentile and n represents the sample size. Note that round
the nearest integer or .5 value, for example: 2.2
2.3
2.7
2.8
2.0
2.5
2.5
3.0
kn
to
100
Example 1.21. The following are the scores of 12 students in a mathematics class.
75
80
68
53
99
58
76
73
85
88
91
79
a)
Find the values of the three quartiles. Where does the score of 88 lie in relation to these quartiles?
b)
Find the interquartile range.
c)
Find the quartile deviation.
d)
Find the value of the 62nd percentile.
Solution.
Chapter 1 - 15

Chapter 1: Descriptive Statistics: 1.1 Some Terms

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Chapter 1: Descriptive Statistics: 1.1 Some Terms

Enviado por

Direitos autorais:

Formatos disponíveis

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II