Escolar Documentos
Profissional Documentos
Cultura Documentos
This presentation is purely for academic purpose and does not carry
any commercial value.
All non-academic images used in this presentation are property of
respective image holder(s). Images are used only for indicative
purpose and does not carry any other meaning.
1
Dilbert on Statistics
2
Please follow this…
3
Descriptive Statistics:
Tabular and Graphical
Presentations
CHAPTER-2 – Descriptive Statistics : Tabular and Graphical Presentation
Business Statistics
www.pibm.in
4
Text Book
TEXT BOOK
Anderson, Sweeney, Williams, Camm, Cochran (2014). Business Statistics,
Cengage Learning (12th Edition)
NOTE – Most of the material is copied/adopted from this book. Please read the book for
additional explanation and understanding.
REFERENCE BOOK
Black, K. (2011). Applied business statistics: Making better business
decisions. Wiley Publication.
5
VISUAL vs TEXT
https://www.deekit.com/5-reasons-to-draw-your-ideas/ 6
Table of Content
7
Colgate-Palmolive Company (Slide-1/4)
8
Colgate-Palmolive Company (Slide-2/4)
9
Colgate-Palmolive Company (Slide-3/4)
To control the problem of heavy detergent powder, limits are placed on the
acceptable range of powder density. Statistical samples are taken periodically,
and the density of each powder sample is measured. Data summaries are then
provided for operating personnel so that corrective action can be taken if
necessary to keep the density within the desired quality specifications.
10
Colgate-Palmolive Company (Slide-4/4)
A frequency distribution for the densities of 150 samples taken over a one-
week period and a histogram are shown in the accompanying table and figure.
Density levels above .40 are unacceptably high. The frequency distribution
and histogram show that the operation is meeting its quality guidelines with
all of the densities less than or equal to .40. Managers viewing these statistical
summaries would be pleased with the quality of the detergent production
process. In this chapter, you will learn about tabular and graphical methods of
descriptive statistics such as frequency distributions, bar graphs, histograms,
stem-andleaf displays, crosstabulations, and others. The goal of these
methods is to summarize data so that the data can be easily understood and
interpreted. 11
Colgate-Palmolive Company (Slide-3/3)
12
DESCRIPTIVE STATISTICS
13
Descriptive Statistics
It is analysis of data which helps to understand and describe/show or
summarize data in a meaningful way.
It is important as raw data is hard to visualize.
Example – If we are analyzing birth certificates then descriptive statistics
can be
Gender ratio (Female v/s male)
Average weight of baby (in kg)
Average age of mother
Range of Age of mother
14
SUMMARIZING QUALITATIVE
DATA
15
Summarizing Qualitative Data
1 Frequency Distribution
2 Relative Frequency
4 Bar Graph
5 Pie Chart
16
Frequency Distribution
17
Frequency Distribution – Example (Slide-1/2)
Data from a Sample of 50 Soft Drink Purchases
EXAMPLE
Construct and interpret a
frequency distribution for
qualitative data as shown in
the table for a sample of 50
soft drink purchases.
Coke Classic, Diet Coke, Dr.
Pepper, Pepsi, and Sprite are
five popular soft drinks.
How will you develop solution?
What will be your inference?
18
Frequency Distribution – Example (Slide-2/2)
Count the number of times each INFERENCE
soft drink appears and list them The frequency distribution
in table as below. summarizes information about
Soft Drink Frequency the popularity of the five soft
Coke Classic 19 drinks.
Diet Coke 8 Coke Classic is the leader.
Dr. Pepper 5 Pepsi is second.
Pepsi 13 Diet Coke is third.
Sprite 5 Sprite and Dr. Pepper are tied
Total 50 for fourth.
19
Relative Frequency Distribution
The relative frequency of a class is the fraction or proportion of the total
number of data items belonging to the class.
n is number of observations.
Frequency of the class
Relative frequency of a class =
𝒏
20
Relative Frequency Distribution
21
Example – Relative & Percent Frequency
Soft Drink Frequency Relative Frequency Percentage Frequency
Coke Classic 19 19÷50 = 0.38 38 %
Diet Coke 8 8÷50 = 0.16 16 %
Dr. Pepper 5 5÷50 = 0.10 10 %
Pepsi 13 13÷50 = 0.26 26 %
Sprite 5 5÷50 = 0.10 10 %
Total 50 1.00 100
From the percent frequency distribution, we see that 38% of the purchases
were Coke Classic, 16% of the purchases were Diet Coke, and so on.
22
Bar Graph
A bar graph is a graphical device for depicting qualitative data that have
been summarized in a frequency, relative frequency, or percent frequency
distribution.
On the horizontal axis we specify the labels that are used for each of the
classes.
A frequency, relative frequency, or percent frequency scale can be used for
the vertical axis.
Using a bar of fixed width drawn above each class label, we extend the
height appropriately.
The bars are separated to emphasize the fact that each class is a separate
category.
23
Example - Bar Graph
24
Pie Chart
The pie chart is a commonly used graphical
device for presenting relative frequency
distributions for qualitative data.
First draw a circle; then use the relative
frequencies to subdivide the circle into sectors
that correspond to the relative frequency for
each class.
Since there are 360 degrees in a circle, a class
with a relative frequency of 0.25 would
consume 0.25(360) = 90 degrees of the circle.
25
Pie Chart - Example
As circle contains 360 degrees and Coke Classic shows a relative frequency of 0.38,
the sector of the pie chart labeled Coke Classic consists of - 0.38(360) = 136.8
degrees.
The sector of the pie chart labeled Diet Coke consists of 0.16(360) = 57.6 degrees.
27
Summarizing Quantitative Data
1 Frequency Distribution
3 Dot Plot
4 Histogram
5 Cumulative Distribution
6 Ogive
28
Frequency Distribution
Frequency distribution is applicable to qualitative and quantitative data.
In quantitative data, one has to be careful in defining the nonoverlapping
classes to be used in frequency distribution.
31
Frequency Distribution – Width of Classes
Determine Width of Classes
The width of all classes is recommend to be the same. Thus the choices
of the number of classes and the width of classes are not independent
decisions.
Use equation as below to decide class width. The approximate class
width given by equation can be rounded to a more convenient value.
32
Frequency Distribution – Class Limits
Class limits must be chosen so that each data item belongs to one and only
one class.
The lower class limit identifies the smallest possible data value assigned to
the class.
The upper class limit identifies the largest possible data value assigned to
the class.
In developing frequency distributions for qualitative data, we did not need
to specify class limits because each data item naturally fell into a separate
class.
EXAMPLE - For a class 10-14, lower class limit is 10 and upper class limit is 14.
33
Frequency Distribution – Steps (Slide-1/2)
40
Histogram - Example
The class with the greatest
frequency is shown by the
rectangle appearing above the
class of 15–19 days.
The height of the rectangle
shows that the frequency of Classes for Audit Time Frequency
this class is 8. 10-14 4
Unlike a bar graph, a histogram 15-19 8
contains no natural separation 20-24 5
between the rectangles of 25-29 2
adjacent classes. 30-34 1 41
Histogram
42
Histogram – Skewness (Slide – 1/2)
Moderately Skewed to the LEFT : A histogram skewed to the left if its tail
extends farther to the left.
This histogram is typical for exam scores, with no scores above 100%, most
of the scores above 70%, and only a few really low scores.
43
Histogram – Skewness (Slide – 2/2)
SYMMETRIC : In a symmetric histogram, the left tail mirrors the shape of
the right tail. Histograms for data found in applications are never perfectly
symmetric, but the histogram for many applications may be roughly
symmetric.
Data for SAT scores, heights and weights of people, and so on lead to
histograms that are roughly symmetric.
Highly Skewed to the RIGHT : This histogram was constructed from data on
the amount of customer purchases over one day at a women’s apparel
store. Data from applications in business and economics often lead to
histograms that are skewed to the right. For instance, data on housing
prices, salaries, purchase amounts, and so on often result in histograms
skewed to the right. 44
Cumulative Distribution
45
Example - Cumulative Distribution
The cumulative frequency for the class “less than or equal to 24” is simply
the sum of the frequencies for all classes with data values less than or
equal to 24.
For the frequency distribution in Table, the sum of the frequencies for
classes 10–14, 15–19, and 20–24 indicates that 4 + 8 + 5 = 17 data values
are “less than or equal to 24.”
Cumulative Cumulative Relative Cumulative Percentage
Audit Time (Days)
Frequency Frequency Frequency
Less than or equal to 14 04 04 ÷ 20 = 0.20 0.20 × 100 = 20
Less than or equal to 19 12 12 ÷ 20 = 0.60 0.60 × 100 = 60
Less than or equal to 24 17 17 ÷ 20 = 0.85 0.85 × 100 = 85
Less than or equal to 29 19 19 ÷ 20 = 0.95 0.95 × 100 = 95
Less than or equal to 34 20 20 ÷ 20 = 1.00 1.00 × 100 = 100 46
Ogive
An ogive is a graph of a cumulative distribution.
The data values, as below, are shown on the horizontal axis.
cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies
The frequency (one of the above) of each class is plotted as a point.
The plotted points are connected by straight lines.
The ogive is constructed by plotting a point corresponding to the
cumulative frequency of each class.
47
Audit Time (Days) Cumulative Frequency
Example - Ogive Less than or equal to 14 4
Less than or equal to 19 12
Because the classes for the audit Less than or equal to 24 17
time data are 10–14, 15–19, 20– Less than or equal to 29 19
24, and so on, one-unit gaps Less than or equal to 34 20
appear from 14 to 15, 19 to 20,
and so on. These gaps are
eliminated by plotting points Audit Time (Days) Point on X axis
halfway between the class limits. Less than or equal to 14 14.5
Thus, 14.5 is used for the 10 –14 Less than or equal to 19 19.5
class, 19.5 is used for the 15–19 Less than or equal to 24 24.5
class, and so on. Less than or equal to 29 29.5
Less than or equal to 34 34.5
48
Example - Ogive
Below data will be used for drawing Ogive curve. A preceding class before the
first class is also taken into consideration for drawing this curve. This class is
less than or equal to 9. Cumulative frequency of this class is taken to be 0.
Audit Time (Days) Class for Audit Time Point on X axis Cumulative Frequency
Less than or equal to 14 10-14 14.5 4
Less than or equal to 19 15-19 19.5 12
Less than or equal to 24 20-24 24.5 17
Less than or equal to 29 25-29 29.5 19
Less than or equal to 34 30-34 34.5 20
49
Example - Ogive
The “less than or equal to 14” class with a cumulative frequency of 4 is shown on the
ogive in graph by point located at 14.5 on the horizontal axis and 4 on the vertical axis.
Cumulative
Audit Time (Days) X axis
Frequency
<= 14 14.5 4
<= 19 19.5 12
<= 24 24.5 17
<= 29 29.5 19
<= 34 34.5 20
50
Example – Waiting Time for Patients
51
Example – Consumer Holiday Spending
52
EXPLORATORY DATA
ANALYSIS
53
Exploratory Data Analysis
54
Stem-and-Leaf Display
A stem-and-leaf display shows both the rank order and shape of the
distribution of the data.
It is similar to a histogram on its side, but it has the advantage of showing
the actual data values.
The first digits of each data item are arranged to the left of a vertical line.
To the right of the vertical line we record the last digit for each item in rank
order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
55
Example - Stem-and-Leaf Display (Slide-1/3)
112 72 69 97 107
73 92 76 86 73
These data result from a 150-
126 128 118 127 124
question aptitude test given to 50
individuals recently interviewed for a 82 104 132 134 83
position at Haskens Manufacturing. 92 108 96 100 92
The data indicate the number of 115 76 91 102 81
questions answered correctly. 95 141 81 80 106
84 119 113 98 75
68 98 115 106 95
100 85 94 106 119
56
Example - Stem-and-Leaf Display (Slide-2/3)
Lowest
112 72 69 97 107 Number
73 92 76 86 73
126 128 118 127 124
82 104 132 134 83
92 108 96 100 92
Highest
Number
115 76 91 102 81
95 141 81 80 106
Lowest 84 119 113 98 75
Number
Highest
68 98 115 106 95 Number
100 85 94 106 119
The data value 112 shows the leading digits 11 to the left of the line and the last digit 2 to
the right of the line 57
Example - Stem-and-Leaf Display (Slide-3/3)
Sort the digits on each line into rank order
The numbers to the left of the vertical line (6, 7, 8, 9, 10, 11, 12, 13, and 14) form the stem,
and each digit to the right of the vertical line is a leaf. 58
Stretched Stem-and-Leaf Display
If we believe the original stem-and-
leaf display has condensed the data
too much, we can stretch the
display by using two more stems
for each leading digit(s).
Whenever a stem value is stated
twice, the first value corresponds
to leaf values of 0-4, and the
second values corresponds to
values of 5-9.
59
Example – Mini Marathon
The 2004 Naples, Florida, mini marathon (13.1 miles) had 1228 registrants (Naples
DailycNews, January 17, 2004). Competition was held in six age groups. The
following data showcthe ages for a sample of 40 individuals who participated in the
marathon.
a. Show a stretched stem-and-leaf display. 49 33 40 37 56
b. What age group had the largest number of runners? 44 46 57 55 32
c. What age occurred most frequently? 50 52 43 64 40
d. ANaples Daily News feature article emphasized the 46 24 30 37 43
number of runners who were “20-something.” What 31 43 50 36 61
percentage of the runners were in the 20-something 27 44 35 31 43
age group? 52 43 66 31 50
e. What do you suppose was the focus of the article? 72 26 59 21 47
60
CROSSTABULATIONS AND
SCATTER DIAGRAMS
61
Crosstabulations
62
Example – Crosstabulation (Slide-1/4)
The quality rating and the meal price data were collected for a sample of 300
restaurants located in the Los Angeles area. Table shows the data for the first
10 restaurants. Data on a restaurant’s quality rating and typical meal price are
reported.
Quality rating is a
qualitative variable with
rating categories of good,
very good, and excellent.
Meal price is a
quantitative variable that
ranges from $10 to $49.
63
Example – Crosstabulation (Slide-2/4)
The Quality Rating and Meal Price labels define the classes for the two
variables. In the left margin, the row labels (good, very good, and
excellent) correspond to the three classes of the quality rating variable.
In the top margin, the column labels ($10–19, $20–29, $30–39, and $40–
49) correspond to the four classes of the meal price variable.
64
Example – Crosstabulation (Slide-3/4)
Row Percentages for Each Quality Rating Category
66
Example – Crosstabulation (Slide-4/4)
68
Example – Scatter Diagram
On 10 occasions during the past three months, the store used weekend television
commercials to promote sales at its stores. The managers want to investigate
whether a relationship exists between the number of commercials shown and sales
at the store during the following week. Sample data for the 10 weeks with sales in
hundreds of dollars are shown in Table
69
TYPES OF RELATIONSHIPS
DEPICTED BY SCATTER
DIAGRAMS
70
Example
The crosstabulation shows household income by educational level of the head of household
(Statistical Abstract of the United States: 2002).
1. Compute the row % and identify the percent frequency distributions of income for households in
which the head is a high school graduate and in which the head holds a bachelor’s degree.
2. What percentage of households headed by high school graduates earn $75,000 or more? What
percentage of households headed by bachelor’s degree recipients earn $75,000 or more?
3. Construct percent frequency histograms of income for households headed by persons with a high
school degree and for those headed by persons with a bachelor’s degree. Is any relationship
evident between household income and educational level?
71
Tabular and Graphical Procedures
DATA
Qualitative Data Quantitative Data
72
HOMEWORK PROBLEMS
73
Example
CSM Worldwide forecasts global production for all automobile manufacturers. The
following CSM data show the forecast of global auto production for General Motors, Ford,
DaimlerChrysler, and Toyota for the years 2004 to 2007 (USA Today, December 21, 2005).
Data are in millions of vehicles.
Construct a time series graph for the years 2004 to 2007 showing the number of
vehicles manufactured by each automotive company. Show the time series for all four
manufacturers on the same graph.
General Motors has been the undisputed production leader of automobiles since 1931.
What does the time series graph show about who is the world’s biggest car company?
Discuss.
Construct a bar graph showing vehicles produced by automobile manufacturer using
the 2007 data. Is this graph based on cross-sectional or time series data? 74
75