Você está na página 1de 61

# Preliminaries

Summary Measures

## STAT101 Introductory Statistics

Data Distributions & Summary Measures

Miscellany

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Preliminaries
Topics:
What is Statistics?
Typical Descriptive Statistics Problems
Note for the Student
It is recommended that students read this section in its
entirety before coming to class for the lecture to ensure that
they have the required background information.1
During the lecture I will mainly focus on sections which have
a direct bearing on the lecture topic under discussion.
Material in the last section serves to complement what we
cover during the lecture.
1

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Statistics Overview
Topics:
What is Statistics?
Applications of Statistics
Learning Objectives:
Learn the nature of Statistics and study its relevance to
Business Research Analysis and Decision Making.
Learn about the different subdisciplines of Statistics concerned
with extracting descriptive information from data, assessing
uncertainty and making statistical inferences & predictions.

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

What is Statistics?
Statistics is the discipline which makes use of mathematical and
computational techniques to, among other things,
collect data using surveys, observational studies or designed
experiments;
describe, summarize and present the collected data;
assess and quantify uncertainty;
draw inferences about population characteristics based on
sample information;
assess the statistical significance of observed differences or
presence of associations;
construct empirical models to obtain estimates, test
hypotheses or for predictive purposes;
make projections using cross-sectional or time series data.

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Applications of Statistics
Some Applications:
Marketing Research
Eg. Assessing Brand Preferences for a Given Product
Finance
Eg. Measuring the Credit Risk of a Counterparty
Insurance
Eg. Measuring Risk of an Insurance Portfolio
Reliability Engineering
Eg. Assessing the Reliability of an Aircraft Engine
Medical Research
Eg. Determining the Efficacy of a New Drug
Q: Do you think Statistics is worthwhile learning? If so, why?

Preliminaries

Summary Measures

Miscellany

## Typical Descriptive Statistics Problems

Organizing Data
Forty students in an Introductory Statistics course were asked to
state their political affliations (i.e., whether they favoured the
Democratic (D), Republican (R) or Other (O) party). The
following results were obtained.
D
D
D
D
O

R
O
R
O
R

O
R
O
D
D

R
D
D
D
R

R
O
R
D
R

R
O
R
R
R

R
R
O
O
R

R
D
R
D
D

## What type of data are we dealing with?

What can we say about the distribution of political affliations?
Source: Adapted from Weiss (2012, p. 40).

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Summarizing Data
Arterial blood pressures (in mm of mercury) for a sample of 16
children of diabetic mothers are given below.
81.6 84.1
82.0 88.9
84.6 104.9
69.4 78.9

87.6
86.7
90.8
75.2

82.8
96.4
94.0
91.0

What does the data tell you about the average blood pressure
of a child whose mother is diabetic?
What can we conclude about the variability of the blood
pressure measurements?
Source: Adapted from Weiss (2012, p. 95)

Preliminaries

Summary Measures

Miscellany

## Empirical Data Distributions

Topics:
Tabulating Data Distributions
Graphing Data Distributions
Learning Objectives:
Learn tabular and graphical techniques for organizing and
presenting data.
Learn how to choose among the available techniques for a
given problem in descriptive statistical analysis.
Note:
Much of the material in this and the next section are of a review nature.
Well quickly review such material but spend more time on material
students are less familiar with.

Preliminaries

Summary Measures

Miscellany

## Tabulating Data Distributions

Tabulating Categorical Data
The first column of the table contains the possible categories
and the second column the correponding absolute frequencies
(optionally, relative frequencies may also be given in another
column).
Example
Consider the political affliation data given in the first illustrative
problem. Following is the frequency table for the data.
Affliation
Democratic
Republican
Other

Abs Freq
13
18
9

Rel Freq
0.325
0.450
0.225

Preliminaries

Summary Measures

Miscellany

## Tabulating Numerical Data

In an absolute frequency table, the number of observations in
each class (i.e., pre-defined sub-interval) is presented.

Class
(l1 , u1 ]
(l2 , u2 ]
(l3 , u3 ]
..
.

Frequency
n1
n2
n3
..
.

(lk , uk ]

nk

## Abs Frequency Table

Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]

Frequency
3
7
4
4
2

Note: (10, 20] refers to values between 10 (exclusive) and 20 (inclusive) etc.

Preliminaries

Summary Measures

Miscellany

## Example [Frequency Tables]

The absolute frequency table in the previous slide was obtained
from the following raw data
12 13 17 21 24 24 26 27 27 30
32 35 37 38 41 43 44 46 53 58
The corresponding relative and cumulative frequency tables are:
Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]

Rel Freq
0.15
0.35
0.20
0.20
0.10

Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]

Cum Freq
0.15
0.50
0.70
0.90
1.00

Preliminaries

Summary Measures

Miscellany

## Graphing Data Distributions

Graphing Distributions for Categorical Data
Pie Chart
A circle is divided into pie slices. The area of each slice is
proportional to the relative frequency of each category.
Example
For the political affliation data, we have the following pie chart.

Pie Slice
Democratic
Republican
Other
Q: How can we improve on this graphical display?

Angle
117 deg
162 deg
81 deg

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Bar Chart
Each category is represented by a vertical (or horizontal) bar.
The height (or width) of each bar is equal or proportional to
the absolute or relative frequency of a category.
Example
For the political affliation data, we have the following bar chart.

Preliminaries

Summary Measures

Miscellany

## Side-by-Side Bar Chart

This chart may be used to present bivariate categorical data.
Example [Side-by-Side Bar Chart]
Consider the following distribution of student grades by gender.
A B C D E
Female 3 9 7 1 1
Male
4 6 5 3 1
In relative terms, we have the following table.
A
B
C
D
E
Female 0.14 0.43 0.33 0.05 0.05
Male
0.21 0.32 0.26 0.16 0.05

Preliminaries

Summary Measures

Miscellany

## Example [Side-by-Side Bar Chart] (contd)

Information in the first (second) table may be displayed by the
chart in the left (right) panel of the following figure.

## Q: What conclusion(s) can be drawn from the above figure?

Q: Does it matter which chart you base you conclusions on?
Source: Adapted from Chow et al (2007, p. 7).

Preliminaries

Summary Measures

Miscellany

## Graphing Distributions for Numerical Data

Absolute Frequency Histogram
Displays information contained in an absolute frequency table
using vertical bars with no gaps between bars.
The height of each bar gives the number of observations that
lie in the interval determined by the base of the bar.
Example
Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]

Frequency
3
7
4
4
2

Preliminaries

Summary Measures

Miscellany

## Relative Frequency Histogram

Displays information in a relative frequency table by vertical
bars with no gaps between bars.
The area of each bar gives the fraction of observations that lie
in the interval determined by the base of the bar.
Example
Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]

Frequency
0.15
0.35
0.20
0.20
0.10

Preliminaries

Summary Measures

Miscellany

Preliminaries

Summary Measures

Miscellany

## Cumulative Frequency Polygon

Displays a plot of cumulative frequency against upper class limit in
an expanded cumulative frequency table (as illustrated below).
Example

Class
(0, 10]
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]

0
15
50
70
90
100

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Digression: Quartiles
Let x1 , x2 , . . . , xn denote a set of n observations for our study.
Usually, the xi s are unordered.
For some applications, we need to work with ordered values in the
dataset, i.e, with x(i) s such that
x(1) x(2) x(n) .
Define
Q2 = second quartile of the xi s

 1
x(k) + x(k+1) , if n = 2k,
2
=
x(k+1) ,
if n = 2k + 1.
Note that Q2 is also referred to as the median of the xi s.

Preliminaries

Summary Measures

Miscellany

## The first quartile, denoted Q1, may be defined as the median of xi

values less than or equal to Q2.
The third quartile, denoted Q3, may be defined as the median of
xi values greater than or equal to Q2.
Example
For the following set of 5 observations
101.96

109.76

99.63

99.76

100.22

101.96

109.76.

## the corresponding ordered sample is

99.63

99.76

100.22

Here,
Q1 = 99.76, Q2 = 100.22 and Q3 = 101.96.

Preliminaries

Summary Measures

Miscellany

## Stem and Leaf Diagram

A stem and leaf diagram (like the one shown below) is a graphical
display that shows the distribution of a set of numerical values.
From it, one can
sometimes recover the original data;
easily infer empirical percentiles;
obtain measures of central tendency and dispersion.
Example
1
2
3
4

|
|
|
|

67788899
0012257
28
2

## Ordered data: 16, 17, . . . , 38, 42.

Distribution is right-skewed.
Q1 = 18, Q2 = 20 and Q3 = 23.5
Min = 16 and Max = 42.

Preliminaries

Summary Measures

## Example [Stem and Leaf Display]

For the Cord Strength dataset
25
34
19
34
25

25
27
25
33
26

1
1
2
2
3
3
4

|
|
|
|
|
|
|

36
21
14
28
27

31
35
32
26
34

26
30
30
43
33

36
41
29
30
27

29
33
31
40
33

37
21
26
32
29

37
26
22
32
30

20
26
24
31
31

we obtain
4
9
01124
55556666667778999
000011112223333444
56677
013

Miscellany

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Boxplots
We introduce the boxplot via a couple of examples.
Example [Boxplot]
Weekly television viewing times (in hours) of a sample of 20 people
are given below.
25
66
34
30

41
35
26
38

27
31
32
30

32 43
15 5
38 16
20 21

5
25
31
38

15
26
32
38

16
27
32
41

20
30
34
43

21
30
35
66

Q1 = 23
Q2 = 30.5
Q3 = 36.5

Preliminaries

Summary Measures

Miscellany

## Example [Boxplot] (contd)

Then, determine the following limits
Lower Limit = Q1 1.5 IQR = 2.75,
Upper Limit = Q3 + 1.5 IQR = 56.75,
where IQR = 36.5 23 = 13.5. Finally, obtain 5 and 43 as the
adjacent valuesa and note that 66 is a potential outlier since it falls
outside the interval (2.75, 56.75).

a
Adjacent values are the most extreme values that lie within the lower and
upper limits; they are the most extreme observations that are not potential
outliers (Weiss, 2012, p. 120).

Preliminaries

Summary Measures

Miscellany

## Example [Parallel Boxplots]

Measurements on skinfold thickness (in mm) for samples of
runners and nonrunners in the same age group are given below.
Runners
|
Nonrunners
-----------------+----------------------7.3 6.7 8.7
|
24.0 19.9 7.5 18.4
3.0 5.1 8.8
|
28.0 29.4 20.3 19.0
7.8 3.8 6.2
|
9.3 18.1 22.8 24.2
5.4 6.4 6.3
|
9.6 19.4 16.3 16.3
3.7 7.5 4.6
|
12.4 5.2 12.2 15.6

Group
Statistics
5 Num Summary
Limits
Potential Outliers

Runners
3.0, 4.85, 6.3, 7.4, 8.8
1.025, 11.225
3.0, 8.8
None

Nonrunners
5.2, 12.3, 18.25, 21.55, 29.4
-1.575, 35.425
5.2, 29.4
None

Preliminaries

Summary Measures

Miscellany

## Q: What conclusions can you draw from the above figure?

Source: Adapted from Weiss (2012, pp. 121-122)

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Summary Measures
Topics:
Location & Spread of a Distribution
Measures of Central Tendency
Measures of Dispersion
Summary Measures for Grouped Data
Learning Objectives:
Learn how to measure the location and spread of the
distribution of raw data for a single numerical variable.
Learn how to obtain summary measures from grouped data.
Learn how to interpret and choose between the various
summary measures.
Learn the role played by robustness in the selection of a
summary measure.

Preliminaries

Summary Measures

Miscellany

Preliminaries

Summary Measures

Miscellany

Preliminaries

Summary Measures

Miscellany

Preliminaries

Summary Measures

Miscellany

Preliminaries

Summary Measures

Miscellany

## Measures of Central Tendency

Let x1 , x2 , . . . , xn denote a set n observations with corresponding
ordered values x(1) , x(2) , . . . , x(n) .
Some measures of central tendency are given below.
Mean

1X
xi = x, say.
mean =
n
i=1

Median

median =

1
2


x(k) + x(k+1) , if n = 2k,
x(k+1) ,
if n = 2k + 1.

Mode
mode = data value with highest frequency.

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Example
Consider dataset
101.96, 109.76, 99.63, 99.76, 100.22
with corresponding ordered values
99.63, 99.76, 100.22, 101.96, 109.76.
Here, the mean is
x=

## 101.96 + 109.76 + 99.63 + 99.76 + 100.22

102.27
5

and
median = x(3) = 100.22.
Q: What about the mode?

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Feature
Always Exists?
Always Unique?
Not Affected by Outliers?
Further Analysis Potential?

Mean
Y
Y
N
Y

Median
Y
N
Y
N

Mode
N
N
Y
N

Note
Use a robust (i.e., resistant) measure of central tendency
when outlying values (assuming these are valid) are present.
The trimmed mean is an example of a robust measure of
location - see Exercise 3.54 on p. 101 of Weiss (2012) for a
specific illustration.
Q: What about the mean and median?

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Example [Robustness]
The mean is not robust since it is affected by outlying (extreme)
observations.
> set.seed(2012)
> x <- rnorm(50, 10, 1)
> mean(x)
 10.03585
> median(x)
 10.09504
Note that Ive decided to stop using R for this course. You may ignore the R
codes that you see in this and the next three examples.

Preliminaries

Summary Measures

Miscellany

## Example [Robustness] (contd)

> x <- sort(x)
> x <- 30
> mean(x)
 10.37307
> median(x)
 10.09504
The median is not affected by extreme observations and hence it is
a robust measure of central tendency.

Preliminaries

Summary Measures

## Relative Magnitude of Location Measures

Example
> table(x)
x
1 2 3 4 5
4 7 23 32 23

6
7

7
4

> mean(x)
 4
> median(x)
 4
The above example illustrates the case when
mean = median = mode.

Miscellany

Preliminaries

Summary Measures

## In the next example, we have

mean < median = mode.
Example
> table(x)
x
1 2 3 4 5 6 7
2 4 7 12 15 33 27
> mean(x)
 5.41
> median(x)
 6

Miscellany

Preliminaries

Summary Measures

## It is also possible that

mean > median = mode.
Example
> table(x)
x
1 2 3 4
27 33 15 12

5
7

6
4

7
2

> mean(x)
 2.59
> median(x)
 2
Q: What is the practical significance of these examples?

Miscellany

Preliminaries

Summary Measures

Miscellany

## Example [Mean vs Median]

The ordered sample and stem and leaf display for some data on
arterial blood pressure are given below.
69.4
82.0
86.7
91.0

75.2
82.8
87.6
94.0

78.9 81.6
84.1 84.6
88.9 90.8
96.4 104.9

6
7
8
9
10

|
|
|
|
|

9
59
22345789
1146
5

Here,
x = 86.18 and median = 85.65.
Q: Which measure do you recommend for the data at hand?

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Measures of Dispersion
Some measures of dispersion are given below.
Range
range = x(n) x(1)
Interquartile Range
IQR = Third Quartile First Quartile
Variance

variance =

1 X
(xi x)2
n1
i=1

Standard Deviation
v
u
u 1
standard deviation = t
n1

n
X
i=1

!
xi2 nx 2

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Example
Consider the (ordered) dataset
99.63, 99.76, 100.22, 101.96, 109.76.
Here,
range = 109.76 99.63 = 10.13
and
IQR = 101.96 99.76 = 2.2.
Furthermore,

99.632 + + 109.762 5 102.272
18.42
variance =
51
and
standard deviation

18.42 = 4.29.

Preliminaries

Summary Measures

Miscellany

## A relative measure of dispersion is

coefficient of variation =

standard deviation
.
mean

Example
For data in the previous example,
coefficient of variation =

4.29
0.04.
102.27

Feature
Always Exists?
Always Unique?
Not Affected by Outliers?
Absolute Measure?
Same Units?

R
Y
Y
N
Y
Y

V
Y
N
N
Y
N

SD
Y
N
N
Y
Y

IQR
Y
N
Y
Y
Y

CV
Y
N
N
N
N

Preliminaries

Summary Measures

Miscellany

## Example [Comparing Stock Performance]

Following are annual logarithmic returns of Microsof (MSFT) and
Hewlett-Packard (HWP) for the period spanning 1995-1999.
|
1995
1996
1997
1998
1999
-----+-----------------------------------MSFT | 0.3644 0.6622 0.5026 0.7648 0.5290
HWP | 0.5014 0.1836 0.2156 0.1864 0.4921

## Some summary statistics for the returns are as follows:

|
MSFT
HWP
-------------+---------------Mean
| 0.5646
0.3158
Std Dev
| 0.1539
0.1657
Median
| 0.5290
0.2156
IQR
| 0.1596
0.3057
Coef of Var | 0.2727
0.5246

Preliminaries

Summary Measures

Miscellany

## Mean & Variance for Grouped Data

Grouped data refers to data in a frequency distribution.
Example
Class |
Freq.
Percent
Cum.
------------+----------------------------------(10,15] |
1
2.00
2.00
(15,20] |
2
4.00
6.00
(20,25] |
8
16.00
22.00
(25,30] |
17
34.00
56.00
(30,35] |
15
30.00
86.00
(35,40] |
5
10.00
96.00
(40,45] |
2
4.00
100.00
------------+-----------------------------------

## Information in the first and any one of the remaining three

columns of the above table constitute grouped data.

Preliminaries

Summary Measures

Let
mi

ni

## = frequency of i-th class,

k = number of classes,
n = total frequency.
The grouped data mean is
xg =

k
X

mi

ni
1X
=
m i ni .
n
n
i=1

i=1

sg2 =

k
X
i=1

mi2

ni
1X 2
x 2g =
mi ni x 2g .
n
n
i=1

Miscellany

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Example
For the grouped data given earlier, we have
2
Class |
ni
mi
mi*ni
mi * ni
-----------+-----------------------------------------(10,15] |
1
12.5
12.5
156.25
(15,20] |
2
17.5
35.0
612.50
(20,25] |
8
22.5
180.0
4050.00
(25,30] |
17
27.5
467.5
12856.25
(30,35] |
15
32.5
487.5
15843.75
(35,40] |
5
37.5
187.5
7031.25
(40,45] |
2
42.5
85.0
612.50
-----------+-----------------------------------------Total |
50
1455.0
44162.50

Hence,
xg =

1455.0
44162.50
= 29.1 and sg2 =
29.12 = 36.44.
50
50

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Topics:
Summation Notation
Classification of Statistical Studies
Questions for Class Discussion
Learning Objectives:
Review the notation used for summation.
Learn about different types of statistical studies.

Miscellany

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Summation Notation
Summation Notation
Given numerical values x1 , . . . , xn , we have:
n
X
xi = x1 + x2 + + xn
i=1
n
n
X
X
(axi + b) = (ax1 + b) + + (axn + b) = a
xi + nb
i=1

i=1

Example
If xi s are given by 1.75, 2.25, 2.25, 2.25, 1.75, 2.00, 1.50, we have
7
X
i=1

xi = 13.75 and

7
X
i=1

Preliminaries

Summary Measures

Miscellany

## Classification of Statistical Studies

Observational Study
Observed relationships and other inferences apply only to
the study subjects (or objects) under investigation.
No control of extraneous sources of variation.
Example [Vasectomies & Prostrate Cancer]
A study found an association between vasectomy and prostrate
cancer - elevated risk after vasectomy.
No information that the study was based on a properly chosen
sample or a properly designed experiment.
We cannot infer causation nor generalize the observed association.
Source: Adapted from Weiss (2012, p. 7).

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Inferential Study
The study is based on a properly chosen sample (e.g., random
sample).
Inferences made from sample information may be generalized
to a larger population.
Example [Testing Baseballs]
An independent testing company investigated the liveliness of 85
randomly selected Rawlings baseballs from the 1977 supplies of
major league teams.
The Rawlings baseball was found to be more lively than the 1976
Spalding baseball.
Source: Adapted from Weiss (2012, p. 6).

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Designed Experiments
A proper randomization technique is used to allocate subjects
(or objects) to treatment and control groups.
Relevant sources of extraneous variation are controlled.
Example [Folic Acid & Birth Defects]
4753 women prior to conception were divided randomly into two
groups. One group took daily doses of folic acid while the other
took only trace elements.
Incidence of major birth defects was much reduced for the group
taking folic acid.
Here, we can infer presence of a causal relationship.
Source: Adapted from Weiss (2012, p. 7).

Preliminaries

Summary Measures

Miscellany

## Questions for Class Discussion

Question 1
A stem-and-leaf display of daily protein intake (in grams) for a
sample of 51 female vegetarians is shown below.
The decimal point is 1 digit(s) to the right of the |
0
1
2
3
4
5
6
7
8

|
|
|
|
|
|
|
|
|

1259
34558
01889
013566688899
001235567
002234467899
88
05

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Question 1 (contd)
A similar display for a sample of 53 female nonvegetarians is given
below.
The decimal point is 1 digit(s) to the right of the |
0
1
2
3
4
5
6
7
8

|
|
|
|
|
|
|
|
|

5
14
34557
4567779
0112444569
0003345577
0113334799
1157
1444

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Question 1 (contd)
(a) The quartiles for both groups of females are partially given in
the following table. Fill in the missing entries in table.
Group
Vegetarian
Nonvegetarian

1st Quartile
38

2nd Quartile
39

3rd Quartile
63

## (b) Based on information in (the completed) table, compare the

location and spread of the two sets of data.
(c) Identify potential outliers, if any, for each dataset. Do you
obtain results that are consistent with what you observe in the
stem-and-leaf displays?

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Question 2
(a) Which of the following is not a property of the coefficient of
variation?
(i)
(ii)
(iii)
(iv)

It
It
It
It

is
is
is
is

## not always unique.

resistant to outliers.
a relative measure.
not in the same units as the original data.

## (b) The (arithmetic) mean computed from raw data is always

unique. The same is true of the mean computed from
grouped data. True or False?
(c) The sample mid-range is a robust measure of location. True
or False?

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Question 3
Suppose you obtain the following five number summaries from
data on annual (percentage) returns for common stock and
government bonds over a fifteen year period.
Investment: Bonds
 -10.460
1.035

4.600

14.080

42.980

Investment: Stocks
 -25.930 -0.495

10.710

23.760

44.770

represent?

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Question 3 (contd)
(b) One of the values given in the five number summary for the
bond returns looks unusual. Is it a potential outlier?
(c) Of the two financial instruments, which is preferred if your
primary investment objective is to choose the one that gives
you the greater level of return on average?
(d) Which is preferred if risk aversion is the key factor influencing
your choice of investment to make?
(e) Is there anything wrong with the following statement?
Under appropriate conditions, the coefficient of variation is a
useful measure to consider when making risk-reward trade-offs
amongst several investment alternatives.

Preliminaries

## Empirical Data Distributions

Summary Measures

Miscellany

Question 4
Consider the following absolute frequency distribution obtained
from data on distance (in miles) travelled to work for a random
sample of 50 workers.
Classes
| (10,20] (20,30] (30,40] (40,50]
----------+-----------------------------------Frequency |
3
19
23
5
(a) Determine the grouped data variance using information
provided by the above empirical distribution.
(b) Determine one other grouped data measure of dispersion.

Preliminaries

Summary Measures

Miscellany

Acknowledgements

## The current slides are based in part on material from:

Introductory Statistics (9th Edition) by Neil A. Weiss.
Introductory Statistics (2nd Edition) by H. K. Chow, A.
Ghosh, D. H. Y. Leung and Y. K. Tse.
The slides were produced using The Beamer Class package and
MikTeX (a public domain document preparation system).
Customized computations and graphics were produced using R (a
public domain statistical software package).
I am grateful to the developers of the above resources for making
them available.