Stem and Leaf

Presenting Data
Week 2
Objectives
On completion of this module you should be able to: produce a stem-and-leaf plot (by hand and using Excel/PHStat2) construct a frequency distribution (by hand and using Excel/PHStat2) plot a histogram, ogive and scatterplot (by hand and using Excel/PHStat2) graph a bar chart, pie chart & grouped (side-byside) bar chart (by hand and using Excel/PHStat2)
2
Objectives
On completion of this module you should be able to: interpret the data presentations listed above, and apply the results and conclusions in real world examples and discover and describe common graphical errors, and explain how to overcome these.
Example 2-1
The following data represent the actual weight of potato chips found in bags labelled 50 grams. The manufacturer aims to overfill the bags by 5 grams to allow for settling and dehydrating of the chips prior to sale. The results of fill weights in a sample of 20 consecutive 50-gram bags are listed below (reading from left to right in the order of being filled):
59.4 56.8 56.0 57.9 59.2 51.7 57.5 54.8 52.6 51.5 51.6 55.7 53.7 54.1 59.6 52.4 55.6 54.5 50.2 56.1
4
(a) Stem-and-leaf
First create an ordered array (order data from smallest to largest).
50.2 51.5 51.6 51.7 52.4 52.6 53.7 54.1 54.4 54.8
55.6 55.7 56.0 56.1 56.8 57.5 57.9 59.2 59.4 59.6
Choose the stems. Probably easiest to use first two digits: 50, 51, 52, 53, 54, The leaves will then be the digits after the decimal point: 2, 5, 6, 7, 4,
5
Stem-and-leaf
Write the stems down the left hand side:
50
51
52 53 54 55 56 57 58 59
6
Stem-and-leaf
First data point is 50.2, so add 2 after 50.
50
51 52 53 54 55 56 57 58 59
7
Stem-and-leaf
Next data point is 51.5, so add 5 after 51.
50 2
51
52 53 54 55 56 57 58 59
Stem-and-leaf
Continue until all data is added.
50 2
51
52 53 54 55 56 57 58 59
5
4 7 1 6 0 5 2
6 7
6 4 8 7 1 8 9 4 6
9
Stem-and-leaf display using PHStat2

Stem-and-Leaf Display Stem unit: 1 50 51 52 53 54 55 56 57 58 59 2 567 46 7 148 67 018 59 246
10
Choosing stems and leaves for messy data

Often there are many possible ways to choose the stems. Lets look at some examples. Data set 1 0.0149, 0.9832, 0.2532, 0.4501, 0.7019, One suggestion is to round the numbers to 2 decimal places: 0.01, 0.98, 0.25, 0.45, 0.70, Use the 1st digit after the decimal point as stem (0, 9, 2, 4, 7,) and 2nd as leaf (1, 8, 5, 5, 0,).
11

Data set 2 394.235, 388.583, 392.891, 393.998, 397.852, Round the numbers to 1 decimal place: 394.2, 388.6, 392.9, 394.0, 397.9, Notice how rounding has affected these numbers (eg 393.998 rounds to 394.0). Use the 1st three digits as the stems (394, 388, 392, 394, 397,) and 1st digit after the decimal point as the leaves (2, 6, 9, 0, 9,).
12

Data set 3 190653, 121987, 154028, 161923, Round numbers to 3 significant figures: 191000, 122000, 154000, 162000, Use the 1st two digits as the stems (19, 12, 15, 16,) and 3rd digits as the leaves (1, 2, 4, 2,).
13
(b) Construct the frequency distribution
Data range: 59.6 - 50.2 9.4 This is a small data set so we choose a small number of classes: 8. Width of interval: 9.4 1.175
8
Easier to round this number to 1.2 (since data is given to 1 dec. pl.). Read information on class and boundary points in the study guide (p. 2-7).
14
(b) Frequency distribution
Now construct a table and tally data:

Weight of bag (gm)
50.2 to less than 51.4
Tally
/
Number of bags
1

52.6 to less than 53.8 53.8 to less than 55.0 55.0 to less than 56.2 56.2 to less than 57.4
////
// /// //// /
4
2 3 4 1

//
///
2
3
15
(b) Frequency distribution & percentage distribution

Weight of bag (gm)
50.2 to less than 51.4 51.4 to less than 52.6 52.6 to less than 53.8 53.8 to less than 55.0 55.0 to less than 56.2 56.2 to less than 57.4 57.4 to less than 58.6 58.6 to less than 59.8 Total
Number of bags
1 4 2 3 4 1 2 3 20
Percentage of bags
1 20 100 5% 4 20 100 20%
2 20 100 10%
3 20 100 15% 4 20 100 20% 1 20 100 5%
2 20 100 10%
3 20 100 15%
100%
16
(c) Frequency histogram

Histogram
5
Frequency
4 3 2 1 0 50.2 51.4 52.6 53.8 55 Bins

17
56.2 57.4 58.6 59.8

Histogram of Bag Weights
4.5 4 3.5 3
Frequency
2.5 2 1.5 1 0.5 0 50.8 52 53.2 54.4 55.6 56.8 58 59.2 Midpoints
18
We will discuss how to produce histograms using Excel and PHStat2 during workshops. Instructions are in the text and Excel Handbook sections included in the text. Make sure you can produce histograms (and other graphs) by hand as well!!!
19
d) Percentage distribution
Percentage Polygon 25%
20%
15%
10%
5%
0% --50.8 52 53.2 54.4 55.6 56.8 58 59.2
20
(e) Cumulative percentage distribution

Weight of bag (gm)
50.2 to less than 51.4 51.4 to less than 52.6 52.6 to less than 53.8 53.8 to less than 55.0 55.0 to less than 56.2 56.2 to less than 57.4 57.4 to less than 58.6 58.6 to less than 59.8
Percentage of bags
5 20 10 15 20 5 10 15
Cumulative percentage
5 25 35 50 70 75 85 100
21
(f) Cumulative percentage polygon (ogive)

Cumulative Percentage Polygon
120%
100%
80%
60%
40%
20%
0% 50.19 51.39 52.59 53.79 54.99 56.19 57.39 58.59 59.79
22
Solution 2-1
(g) On the basis of the results of (a) through (f), does there appear to be any concentration of the bag weights around specific values? There are no obvious outliers, no obvious patterns and the data seems fairly even distributed from 51 to about 60.
23
Solution 2-1
(h) If you had to make a prediction of the weight of potato chips in the next bag, what would you predict? Why? The best prediction would be somewhere around the middle of the data (because there is no trend or pattern obvious): we could predict about 55 grams. Note: we will learn how to make more accurate forecasts later in the course.
24
Example 2-2
In recent years, the cost of holiday accommodation on a particular island has been increasing. There was, however, a reduction as a reaction to reduced air travel in the aftermath of the attacks of September 11, 2001. Since then, rising fuel costs have increased the cost of commercial flights and so further discouraged travel to the island, but despite this, the cost of accommodation has continued to increase.
25
Example 2-2
Year
This data represents the cost of a double room for one nights accommodation on the island for the years 1995 to 2006. (a) Set up a scatter diagram with cost of the double room on the y-axis and year on the x-axis.
Cost of double room ($)

100
120 130 145 170 230 200 195 180
1995
1996 1997 1998 1999 2000 2001 2002 2003
2004
2005 2006
185
190 205
26
(a) Scatterplot
Cost of double room
250
200
Cost ($)
150
100
50
0 1994
1996
1998
2000 Year
2002
2004
2006
2008
27
Time series plot
Time series data is data that is recorded at regular time intervals (in our example it was years). A time series plot has time on the x-axis and connects the data with straight lines. Since this particular example records the data annually (i.e. at regular intervals), a time series plot is more appropriate than a scatterplot.
28
Time series plot

Cost of double room
250
200
Cost ($)
150
100
50
0 1994
1996
1998
2000 Year
2002
2004
2006
2008
29
(b) Patterns in the data
There is a clear upward trend in room cost from 1995 to 2001. Close to the September 11 attacks, cost decreases for each of the next three years. From 2004, the cost begins to increase again, but does not (yet) return to the heights experienced prior to September 11.
30
Example 2-3
A DVD hire company deals with a number of complaints regarding their rental DVDs. The number of times each complaint occurred is given in the table.
Complaint
Scratched disc Dirty disc Cracked disc Wrong DVD Too expensive Coarse language Explicit content Boring Too violent Too soppy Not funny Rental period too short Bad movie Rude staff Store closed
Frequency
125 116 21 54 39 26 41 29 18 27 33 14 12 9 4
31
(a) Construct a bar chart
Normally word categories (as in this example) are listed up y-axis and number categories (for example years, months, pay classification scales etc) are listed across the x-axis. Make sure you are confident preparing a bar chart by hand!! Remember always label axes and give graphs a title!
32
(a) Bar chart

Bar Chart
Wrong DVD Too violent Too soppy Too expensive Store closed Scratched disc
Complaint
Rude staff Rental period too short Not funny Explicit content Dirty disc Cracked disc Coarse language Boring Bad movie 0 20 40 60 80 100 120
33
140
(b) Construct a pie chart

Pie Chart Wrong DVD 10% Too violent 3% Too soppy 5% Too e xpe ns ive 7% Store clos ed 1% Dirty dis c 20%
Bad m ovie Boring 2% 5% Coars e language 5%d dis c Cracke 4%
Scratched dis c 21%
Explicit content Not funny 7% ntal pe riod too Rude Re s taff 6% short 2% 2%
34
(b) Pie chart
The default view of this pie chart is difficult to read! Youll often have to work with default graphs to improve their look (especially for assignments!!) Sometimes just resizing the graph can help!
35
(b) Pie chart

To produce a pie-chart by hand you need a protractor (to measure degrees). In the exam you will only get easy category sizes (eg multiples of 45o). To calculate the degrees for each category:
Complaint Scratched disc Frequency 125 %
125 100 21% 586 116 100 20% 586 21 100 4% 586
Degrees
125 360 77 586 116 360 71 586 21 360 13 586
Dirty disc
Cracked disc
116
21
36
Sc ra tc h
10% 15% 20% 25% 0% 5%
ed di s c
D irt y di sc D VD
W ro n g Ex pl ic it co nt en t
To o ex pe ns iv e
N ot fu n ny
B or in g To o so pp y gu a ge C oa rs e la n
Complaint
(c) Pareto diagram
Pareto Diagram
C ra c ke d di s c
R en t al p er io d to o B o vi ol en
To t
sh or t ad m ov R ud e St or e
ie st cl
af f os e
37
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
(c) Pareto diagram
Often Pareto charts have both vertical axes with the same scale Excel and PHStat2 do not do this easily. On the following slide, the left-hand axis allows for the total frequency value (586) and this lines up exactly with 100% in the cumulative frequency on the right-hand side. This graph also groups the very small categories (in this case only the last category) calling them other.
38
(c) Pareto diagram

Pareto Chart of Complaint
600 500
Count
100 80 60 40 20 0
Percent
39
400 300 200 100 Complaint

sc isc V D ent i ve nny ing ppy ge isc ent or t v ie taff her i d d D nt ns u or o ua d ol sh o s t y f i O d s d g g m t o e B e v t e r e o n c p n oo l a ack oo to Bad Rud ch Di r o cit ex No t T T od W pli oo ra s e Cr r c ri x a S T e E o p C al t n Re
Count Percent Cum %
125 116 54 41 39 33 29 27 26 21 18 14 12 9 4 22 20 10 7 7 6 5 5 5 4 3 2 2 2 1 22 42 52 59 66 72 77 82 86 90 93 96 98 99 100
(d) Which graphical method do you think is best to portray this data?
Pie chart : Too cramped with so many categories. The similar sized segments are difficult to compare. In some views, the category labels overlap each other. Pareto chart preferred over bar chart since it orders categories from smallest to largest, includes the cumulative percentage polygon and makes it easy to see most common complaints. 40
(e) Conclusions about most common complaints
The two most common complaints are scratched and dirty discs (21% and 20% respectively or 41% of complaints in total). The third most common is wrong DVD (10%).
41
Additional Example: Grouped bar chart

Given the following two-way cross-classification table, construct a side-by-side bar chart comparing men and women for each of the three categories on the vertical axis. Discuss the resulting graph.
Junior accountant Men 40 Accountant 50 Senior accountant 25 Total 115
Women
35
30
70
42
Grouped bar chart

Gender and job position
Senior accountant
Junior accountant
Male Female
Accountant
10
20
30
40
50
60
43
Grouped bar chart
Because there are clearly more men in each of the three job positions (junior accountant, accountant and senior accountant) it is difficult to comment on the ratio of men to women in each class. Note that PHStat2 has changed the order of the three categories (to alphabetical order from bottom to top). It seems as if the relative number of women is dropping as the job position increases. We might be better to use relative frequencies to compare gender differences. 44
Features of graphical data

Basic features of an ideal graph include: Showing the data Getting the viewer to focus on the substance of the graph rather than on how the graph was developed Avoiding distortion Encouraging comparisons of data Serving a clear purpose Being integrated with the statistical and verbal descriptions of the graph
Source: Levine et al., 2005.
45
Principles of graphical excellence

Graphical excellence: is a well-designed presentation of data that provides substance, statistics, and design. communicates complex ideas with clarity, precision, and efficiency. gives the viewer the largest number of ideas in the shortest time with the least ink. almost always involves several dimensions. requires telling the truth about the data.
Source: Levine et al., 2005. 46
Data-Ink Ratio
The data-ink ratio is the proportion of the graphics ink that is devoted to nonredundant display of data information. Data - ink Data - ink ratio = Total ink used to print the graphic
Aim maximise proportion of ink used in graph that is devoted to data.
47
Graphical excellence
Chartjunk decoration that is non-data-ink or redundant data ink. Lie factor the ratio of the size of the effect shown in the graph to the size of the effect in the data. Aim is to reduce both of these! Will discuss examples in more detail in tutorials important to be there!
48
After the lecture each week
Review the lecture material Complete all readings Complete all of recommended problems (listed in SG) from the textbook Complete at least some of additional problems Consider (briefly) the discussion points prior to tutorials
49

Stem and Leaf

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Stem and Leaf

Enviado por

Direitos autorais:

Formatos disponíveis

Presenting Data

Stem-and-leaf display using PHStat2

Choosing stems and leaves for messy data

Choosing stems and leaves for messy data

Choosing stems and leaves for messy data

(b) Construct the frequency distribution

(b) Frequency distribution

Now construct a table and tally data:

51.4 to less than 52.6

57.4 to less than 58.6

(b) Frequency distribution & percentage distribution

(c) Frequency histogram

4 3 2 1 0 50.2 51.4 52.6 53.8 55 Bins

56.2 57.4 58.6 59.8

(c) Frequency histogram

(c) Frequency histogram

0% --50.8 52 53.2 54.4 55.6 56.8 58 59.2

(e) Cumulative percentage distribution

(f) Cumulative percentage polygon (ogive)

0% 50.19 51.39 52.59 53.79 54.99 56.19 57.39 58.59 59.79

Cost of double room ($)

Time series plot

Time series plot

(b) Patterns in the data

(a) Construct a bar chart

(a) Bar chart

(b) Construct a pie chart

Bad m ovie Boring 2% 5% Coars e language 5%d dis c Cracke 4%

Scratched dis c 21%

(b) Pie chart

(b) Pie chart

(c) Pareto diagram

(c) Pareto diagram

(c) Pareto diagram

400 300 200 100 Complaint

Count Percent Cum %

125 116 54 41 39 33 29 27 26 21 18 14 12 9 4 22 20 10 7 7 6 5 5 5 4 3 2 2 2 1 22 42 52 59 66 72 77 82 86 90 93 96 98 99 100

(e) Conclusions about most common complaints

Additional Example: Grouped bar chart

Grouped bar chart

Grouped bar chart

Features of graphical data

Principles of graphical excellence

After the lecture each week

Você também pode gostar