Escolar Documentos
Profissional Documentos
Cultura Documentos
DESCRIPTIVE STATISTICS
(Summarizing and describing the important features of the collected data by graphical and
numerical methods)
(a) Stem-and-Leaf Displays: is a good way to obtain informative visual display of a data set x1,
x2, xn, where each number xi consists of at least two digits.
Note: A stem-and-leaf display does not show the order in which observations were obtained.
32 40 47 52 55 56 56 58 59 60 63 66 67 68 69 70 70 72 72 72 73
74 74 77 77 78 80 81 82 83 83 83 84 85 87 90 91
Stem-and-leaf of the Final Exam Scores N=37
Leaf Unit = 1.0
1 3 2
1 3
2 4 0
3 4 7
4 5 2
9 5 56689
11 6 03
15 6 6789
(8) 7 00222344
14 7 778
11 8 0123334
4 8 57
2 9 01
IE 256/B.U./Ekşioğlu 1
Example 2:
IE 256/B.U./Ekşioğlu 2
Stem-and-Leaf Display for the compressive strength data.
IE 256/B.U./Ekşioğlu 3
(b) Dotplots
Example 1: Consider the final exam scores from the preceding example.
32 42 52 62 72 82 92
Final Exam Scores
IE 256/B.U./Ekşioğlu 4
Example 2:
Suppose an engineer is designing a nylon connector to be used in an automotive engine
application. The engineer is considering establishing the design specification on wall thickness
at 3/32 inch but is somewhat uncertain about the effect of this decision on the connector pull-off
force. If the pull-off force is too low, the connector may fail when it is installed in an engine.
Eight prototype units are produced and their pull-off forces measured, resulting in the following
data (in foot-pounds):
pull-off forces measured: 12.6 12.9 13.4 12.3 13.6 13.5 12.6 13.1
Dot diagram of the pull-off force data when wall thickness is 3/32 inch.
Not: When the number of observations is small, it is usually difficult to identify any specific
pattern in the variability.
IE 256/B.U./Ekşioğlu 5
(c) Histograms: A histogram, basically, is a frequency distribution shown in a graphical form.
To draw a histogram:
For continuous data: Equal class widths
Use the horizontal axis to represent the measurement scale
Subdivide the measurement axis into suitable number of intervals (classes).
The vertical axis represents the frequency (or relative frequency) scale.
Relative frequencies are found by dividing the observed frequency in each class
(interval) by the total number of observations.
Relative frequency of a class = number of observations in the class / number of observations
in the data set
the data
Above
set each class interval draw a rectangle whose height is corresponding
frequency (or alternatively, relative frequency).
For continuous data: Unequal class widths
Determine the number of intervals (classes): Use a few wider intervals near
extreme observation and narrower intervals in the region of high concentration
Determine frequencies and relative frequencies for each class.
Calculate the height of each rectangle by:
Rectangle height = density = relative frequency of the class / class width (rectangle width)
Not: In determining the intervals, make sure that none of the observations lies on a
class boundary. For this purpose, use either < symbols or add a hundredths digit to the
class boundaries to prevent observations from falling on the boundaries.
For discrete data:
Determine the frequency and relative frequency of each x value.
Relative frequency of a value = number of times the value occurs / number of observations
in the data set
the data
Then
set mark possible x values on a horizontal scale.
Above each value draw a rectangle whose height is corresponding frequency (or
alternatively, relative frequency) of that value.
For categorical data:
Determine the frequency and relative frequency of each category.
Mark the different categories with equal intervals on a horizontal scale.
Above each value draw a rectangle whose height is corresponding frequency (or
alternatively, relative frequency) of that category.
IE 256/B.U./Ekşioğlu 6
Example:
Compressive Strength of 80 Aluminum-Lithium Alloy Specimens (n=80)
IE 256/B.U./Ekşioğlu 7
Example for Frequency Distribution and Histogram
IE 256/B.U./Ekşioğlu 8
A histogram of the compressive strength data from Minitab with
9 bins (rectangles).
IE 256/B.U./Ekşioğlu 9
A histogram of the compressive strength data from Minitab with
17 bins (rectangles).
IE 256/B.U./Ekşioğlu 10
Histogram of compressive strength for 80 aluminum-lithium alloy specimens.
IE 256/B.U./Ekşioğlu 11
A cumulative frequency histogram of strength data from MINITAB (for 17
intervals).
IE 256/B.U./Ekşioğlu 13
Histogram Shapes
IE 256/B.U./Ekşioğlu 14
PARETO HISTOGRAM (Chart)
IE 256/B.U./Ekşioğlu 15
SCATTER DIAGRAM (Plot)
What it is:
A scatter diagram is a tool for analyzing relationships between two variables. One variable is plotted on the horizontal axis and
the other is plotted on the vertical axis. The pattern of their intersecting points can graphically show relationship patterns. Most often
a scatter diagram is used to prove or disprove cause-and-effect relationships. While the diagram shows relationships, it does not by
itself prove that one variable causes the other. In addition to showing possible cause-and-effect relationships, a scatter diagram can
show that two variables are from a common cause that is unknown or that one variable can be used as a surrogate for the other.
When to use it (examples):
1. Use a scatter diagram to examine theories about cause-and-effect relationships and to search for root causes of an identified
problem.
2. Use a scatter diagram to design a control system to ensure that gains from quality improvement efforts are maintained
30
25
20
15
10
15 20 25 30 35 40
Dominant Maximum Grip Strength (kgf)
Female dominant (DGS) and non-dominant (NGS) maximum grip strength relationship
(Source: Ekşioğlu and Baykar, 2009)
IE 256/B.U./Ekşioğlu 16
PIE CHART
Regional distribution of the family origin of the participants across Turkey (Source: Ekşioğlu and
Baykar, 2009)
BAR CHART
Source: Cherng, Ekşioğlu and Kızılaslan: Vibration reduction of pneumatic percussive rivet tools: Mechanical and
ergonomic re-design approaches. Applied Ergonomics 40 (2009) 256– 266
IE 256/B.U./Ekşioğlu 17
B. NUMERICAL SUMMARY MEASURES OF DATA SETS: Measures of
Location and Measures of Variability (Mean, Median, Trimmed Mean,
Variance, Standard Deviation, Range, Quartiles, Sample Proportions)
pull-off forces measured: 12.6 12.9 13.4 12.3 13.6 13.5 12.6 13.1
Pull-off force
2. Trimmed Mean: It is obtained by eliminating some observations from both the smallest and
largest end of the data in equal proportion, then averaging what is left over.
3. Population and Sample Medians: Divides the data into two equal parts, half below the
median and half above.
~
Calculating Sample median: x
Step 1: Order the observations from smallest to largest (include any repeated
values):
Step 2: calculate the median as follows:
IE 256/B.U./Ekşioğlu 18
~
when n is odd x = [(n+1)/2]th ordered value = The single middle value
~
when n is even x = [(n/2)th and (n/2 + 1)th ordered values] / 2
= Average of the two middle values
~
Population median: (calculated in the same manner as the sample median by
substituting n with N)
The first or lower quartile, Q1: is a value that has approximately one-fourth (25%) of the
observations below it and approximately 75% of observations above.
The second quartile, Q2: it has approximately one-half (50%) of the observations below its
value = median value
The third quartile, Q3: It has approximately three-fourths (75%) of the observations below its
value.
6. Percentiles: Similar to quartile calculation, a data set (sample or population) can be
IE 256/B.U./Ekşioğlu 19
divided into 100 equal parts using percentiles. For example, the 90th percentile
separates the highest 10% from the bottom 90%, and so on (will be covered in Ch. 4
in more detail).
8. Population and sample variance and standard deviation: Variance is a measure of the
spread of the data.
i=N
Population variance: = (xi – )2 / N
2
i=1
N= finite number of observation in the population
Standard deviation: it is a measure of the spread of the data using the same units as the
data.
Population standard deviation: = ( 2)1/2 (positive square root)
The sample variance of the sample x1, x2, …xn of n values of X is
given by
i=n
Sample variance: s = (xi – x )2 / n-1
2
i=1
Complete it!
IE 256/B.U./Ekşioğlu 20
Properties of s2
IE 256/B.U./Ekşioğlu 21
9. Box Plots
Description of a boxplot
After the n observations in a data set are ordered from smallest to largest, the lower (upper) fourth is the median of the smallest
(largest) half of the data, where the median is included in both halves if n is odd. A measure of the spread that is resistant to outliers
is the fourth spread fs = upper fourth – lower fourth.
Outliers: Any observation farther than 1.5fs from the closest fourth is an outlier. An outlier is extreme if it is more than 3fs from the
nearest fourth, and it is mild otherwise.
IE 256/B.U./Ekşioğlu 22
Boxplot for the compressive strength data.
Boxplots for Fabs and Fnet forces: Subjects categorized by gender (source: Eksioglu, M.,
Kızılaslan, K., Steering-wheel grip force characteristics of drivers as a function of gender, speed,
and road condition. International Journal of Industrial Ergonomics, 2008).
IE 256/B.U./Ekşioğlu 23
40
35
25
20
15
Boxplots of maximum grip strength of Turkish females stratified by age groups (Source: Ekşioğlu and Baykar,
2009)
70
60
Maximum Grip Strength (kgf)
50
40
30
20
10
F M
Gender
Boxplots of grip strength stratified by gender for NME and students (18-69 yrs) (Source: Ekşioğlu and Baykar,
2009)
IE 256/B.U./Ekşioğlu 24
10. Time Series (Sequence) Data Plots: A time series is a sequence of observations which are
ordered in time (or space). If observations are made on some phenomenon throughout time, it
is most sensible to display the data in the order in which they arose, particularly since
successive observations will probably be dependent. Time series are best displayed in a scatter
plot. The series value X is plotted on the vertical axis and time t on the horizontal axis. Time is
called the independent variable (in this case however, something over which you have little
control). There are two kinds of time series data:
1. Continuous, where we have an observation at every instant of time, e.g. lie detectors,
electrocardiograms. We denote this using observation X at time t, X(t).
Examples
Economics - weekly share prices, monthly profits
Meteorology - daily rainfall, wind speed, temperature
Sociology - crime figures (number of arrests, etc), employment figures
IE 256/B.U./Ekşioğlu 25