Escolar Documentos
Profissional Documentos
Cultura Documentos
A.Ramesh
Department of Mechanical Engineering NIT Calicut
Email: ramesh@nitc.ac.in, ramesh.anbanandam@gmail.com
Phone:0495-228- 6540
So far….
• Importance of • Collecting data
statistics in business • Difference between
• Classification of sample and population
statistics • Finding a meaningful
• Classification of data pattern in the data
– Nominal – Frequency distribution
– Ordinal – Histogram
– Interval – Frequency polygons
– Ratio – Ogives
– Pareto diagram
– Scatter plot
ramesh.anbanandam@gmail.com 2
Measures of Central Tendency:
• Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
• A single number to describe the
characteristic of a set of data
ramesh.anbanandam@gmail.com 3
Summary statistics
• Central tendency or • Dispersion
measures of location – Skewness
– Arithmetic mean – Kurtosis
– Weighted mean – Range
– Geometric mean – Interfractile range
– Median – Variance
– Percentile – Standard score
– Coefficient of variation
ramesh.anbanandam@gmail.com 4
Arithmetic Mean
• Commonly called ‘the mean’
• It is the average of a group of numbers
• Applicable for interval and ratio data
• Not applicable for nominal or ordinal data
• Affected by each value in the data set,
including extreme values
• Computed by summing all values in the
data set and dividing the sum by the number
of values in the data set
ramesh.anbanandam@gmail.com 5
Population Mean
X X X X
1 2 3
... X N
N N
24 13 19 26 11
5
93
5
18. 6
ramesh.anbanandam@gmail.com 6
Sample Mean
X
X X X X
1 2 3
... X n
n n
57 86 42 38 90 66
6
379
6
63.167
ramesh.anbanandam@gmail.com 7
Mean of Grouped Data
• Weighted average of class midpoints
• Class frequencies are the weights
fM
f
fM
N
f 1 M 1 f 2 M 2 f 3 M 3 f iM i
f 1 f 2 f 3 fi
ramesh.anbanandam@gmail.com 8
Calculation of Grouped Mean
Class Interval Frequency(f) Class Midpoint(M) fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150
fM
2150
43 . 0
f 50
ramesh.anbanandam@gmail.com 9
Median
• Middle value in an ordered array of
numbers.
• Applicable for ordinal, interval, and ratio
data
• Not applicable for nominal data
• Unaffected by extremely large and
extremely small values.
ramesh.anbanandam@gmail.com 10
Median: Computational Procedure
• First Procedure
– Arrange the observations in an ordered array.
– If there is an odd number of terms, the median
is the middle term of the ordered array.
– If there is an even number of terms, the median
is the average of the middle two terms.
• Second Procedure
– The median’s position in an ordered array is
given by (n+1)/2.
ramesh.anbanandam@gmail.com 11
Median: Example
with an Odd Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
ramesh.anbanandam@gmail.com 13
Median of Grouped Data
N
cfp
Median L 2 W
fmed
Where:
L the lower limit of the median class
cfp = cumulative frequency of class preceding the median class
fmed = frequency of the median class
W = width of the median class
N = total of frequencies
ramesh.anbanandam@gmail.com 14
Median of Grouped Data -- Example
Cumulative N
cfp
Class Interval Frequency Frequency Md L 2 W
20-under 30 6 6 fmed
30-under 40 18 24 50
40-under 50 11 35 24
50-under 60 11 46 40 2 10
11
60-under 70 3 49
40.909
70-under 80 1 50
N = 50
ramesh.anbanandam@gmail.com 15
Mode
• The most frequently occurring value in a
data set
• Applicable to all levels of data
measurement (nominal, ordinal, interval,
and ratio)
• Bimodal -- Data sets that have two modes
• Multimodal -- Data sets that contain more
than two modes
ramesh.anbanandam@gmail.com 16
Mode -- Example
• The mode is 44.
• There are more 44s 35 41 44 45
37 43 44 46
39 43 44 46
40 43 44 46
40 43 45 48
ramesh.anbanandam@gmail.com 17
Mode of Grouped Data
ramesh.anbanandam@gmail.com 19
Percentiles: Computational Procedure
• Organize the data into an ascending ordered array.
• Calculate the
percentile location:
P
i (n)
100
• Determine the percentile’s location and its value.
• If i is a whole number, the percentile is the
average of the values at the i and (i+1) positions.
• If i is not a whole number, the percentile is at the
(i+1) position in the ordered array.
ramesh.anbanandam@gmail.com 20
Percentiles: Example
• Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
• Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
• Location of
30th percentile: 30
i (8 ) 2. 4
100
• The location index, i, is not a whole number; i+1 =
2.4+1=3.4; the whole number portion is 3; the
30th percentile is at the 3rd location of the array;
the 30th percentile is 13.
ramesh.anbanandam@gmail.com 21
Dispersion
• Reliability of measure of central tendency
• To compare dispersion of various samples
ramesh.anbanandam@gmail.com 22
Variability
No Variability in Cash Flow Mean
Mean
ramesh.anbanandam@gmail.com 23
Measures of Variability or dispersion
– Range
– Interfractile range
– Deviation of Variance
– Standard Deviation
ramesh.anbanandam@gmail.com 24
Measures of Variability:
Ungrouped Data
• Measures of variability describe the spread
or the dispersion of a set of data.
• Common Measures of Variability
– Range
– Inter quartile Range
– Mean Absolute Deviation
– Variance
– Standard Deviation
– Z scores
– Coefficient of Variation
ramesh.anbanandam@gmail.com 25
Range
• The difference between the largest and the
smallest values in a set of data
• Simple to compute 35 41 44 45
• Ignores all data points except 37 41 the44 46
two extremes
• Example: 37 43 44 46
Range 39 43 = 44 46
Largest - Smallest =
48 - 35 = 13 40 43 44 46
40 43 45 48
ramesh.anbanandam@gmail.com 26
Quartiles
• Measures of central tendency that divide a group of
data into four subgroups
• Q1: 25% of the data set is below the first quartile
• Q2: 50% of the data set is below the second quartile
• Q3: 75% of the data set is below the third quartile
• Q1 is equal to the 25th percentile
• Q2 is located at 50th percentile and equals the
median
• Q3 is equal to the 75th percentile
• Quartile values are not necessarily members of the
data set
ramesh.anbanandam@gmail.com 27
Quartiles
Q1 Q2 Q3
ramesh.anbanandam@gmail.com 28
Quartiles: Example
• Ordered array: 106, 109, 114, 116, 121, 122,
125, 129
• Q1 25 109 114
i (8 ) 2 Q1 111 . 5
100 2
ramesh.anbanandam@gmail.com 29
Interquartile Range
ramesh.anbanandam@gmail.com 30
Deviation from the Mean
• Data set: 5, 9, 16, 17, 18
• Mean:
X 65
13
N 5
• Deviations from the mean: -8, -4, 3, 4, 5
-4 +5
-8 +4
+3
0 5 10 15 20
ramesh.anbanandam@gmail.com 31
Mean Absolute Deviation
• Average of the absolute deviations from the
mean
X X X X
M . A. D.
5 -8 +8 N
9 -4 +4 24
16 +3 +3
17 +4 +4
5
18 +5 +5 4. 8
0 24
ramesh.anbanandam@gmail.com 32
Population Variance
• Average of the squared deviations from the
arithmetic mean
X X
X X
2
2
2
5 -8 64
9 -4 16 N
16 +3 9 130
17 +4 16 5
18 +5 25 26.0
0 130
ramesh.anbanandam@gmail.com 33
Population Standard Deviation
• Square root of the
variance
X
2
X X
2
X
2
N
5 -8 64 130
9 -4 16
5
16 +3 9
26.0
17 +4 16
18 +5 25
2
0 130
26.0
ramesh.anbanandam@gmail.com
51
. 34
Sample Variance
• Average of the squared deviations from the
arithmetic mean
X X X XX X X
2
2
2
2,398 625 390,625 S
n 1
1,844 71 5,041
1,539 -234 54,756 663,866
1,311 -462 213,444 3
7,092 0 663,866 221,288.67
ramesh.anbanandam@gmail.com 35
Sample Standard Deviation
• Square root of the
X X
2
sample variance 2
S
n 1
X X X XX
2
663,866
2,398 625 390,625
1,844 71 5,041 3
1,539 -234 54,756 221,288.67
1,311 -462 213,444 2
7,092 0 663,866 S S
221,288.67
470.41
ramesh.anbanandam@gmail.com 36
Uses of Standard Deviation
• Indicator of financial risk
• Quality Control
– construction of quality control charts
– process capability studies
• Comparing populations
– household incomes in two cities
– employee absenteeism at two plants
ramesh.anbanandam@gmail.com 37
Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial
Security
A 15% 3%
B 15% 7%
ramesh.anbanandam@gmail.com 38
Empirical Rule
• Data are normally distributed (or
approximately normal)
1 68
2 95
3 99.7
ramesh.anbanandam@gmail.com 39
Coefficient of Variation
• Ratio of the standard deviation to the mean,
expressed as a percentage
• Measurement of relative dispersion
C .V . 100
ramesh.anbanandam@gmail.com 40
Coefficient of Variation
1 29 2 84
1
4.6 2
10
. . 100
CV 1
1
CV
. . 2
2
100
1 2
4.6 10
100 100
29 84
1586
. 1190
.
ramesh.anbanandam@gmail.com 41
Variance and Standard Deviation
of Grouped Data
Population Sample
f M S MX
2 2
2 f
2
n 1
N
2
2 S S
ramesh.anbanandam@gmail.com 42
Population Variance and Standard
Deviation of Grouped Data(mu=43)
M M M M
2
f fM
2
Class Interval f
M
2 2
f 7200
144 12
2
144
N 50
ramesh.anbanandam@gmail.com 43
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a distribution
• Kurtosis
– Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal shape
– Platykurtic: flat and spread out
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness
ramesh.anbanandam@gmail.com 44
Skewness
ramesh.anbanandam@gmail.com 45
Skewness..
The skewness of a distribution is measured by
comparing the relative positions of the mean,
median and mode.
• Distribution is symmetrical
• Mean = Median = Mode
• Distribution skewed right
• Median lies between mode and mean, and
mode is less than mean
• Distribution skewed left
• Median lies between mode and mean, and
mode is greater than mean
ramesh.anbanandam@gmail.com 46
Skewness
ramesh.anbanandam@gmail.com 47
Coefficient of Skewness
• Summary measure for skewness
3 M d
S
• If S < 0, the distribution is negatively skewed
(skewed to the left).
• If S = 0, the distribution is symmetric (not
skewed).
• If S > 0, the distribution is positively skewed
(skewed to the right).
ramesh.anbanandam@gmail.com 48
Coefficient of Skewness
23
1
2
26 3
29
M 26
d1 M d2 26 Md3 26
12.3
1 2
12.3 3
12.3
3 M d1
3 2 M d2
3 3 M d3
S S S
1
1
1
2
2
3
3
Leptokurtic
Mesokurtic
Platykurtic
ramesh.anbanandam@gmail.com 50
Box and Whisker Plot
• Five specific values are used:
– Median, Q2
– First quartile, Q1
– Third quartile, Q3
– Minimum value in the data set
– Maximum value in the data set
ramesh.anbanandam@gmail.com 51
Box and Whisker Plot
Minimum Q1 Q2 Q3 Maximum
ramesh.anbanandam@gmail.com 52
Skewness: Box and Whisker Plots,
and Coefficient of Skewness
S<0 S=0 S>0
X Y
XY n
2 X 2
2
Y 2
X
n Y
n
1 r 1
ramesh.anbanandam@gmail.com 54
Three Degrees of Correlation
r<0 r>0
r=0
ramesh.anbanandam@gmail.com 55
Computation of r for
an Example
Futures
Interest Index
Day X Y X2 Y2 XY
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
Summations 92.93 2,725 720.220 619,207 21,115.07
ramesh.anbanandam@gmail.com 56
Computation of r
for the Example
X Y
XY
n
r
X
2
Y
2
X n Y n
2 2
92.93 2725
21115
, .07
12
720.22
92 .93
2
619,207 2725
2
12 12
.815
ramesh.anbanandam@gmail.com 57
Scatter Plot and Correlation Matrix
for the Example
245
240
Futures Index
235
230
225
220
7.40 7.60 7.80 8.00 8.20
Interest
ramesh.anbanandam@gmail.com 58