A CH1sp11

IE 256 - CH 1
DESCRIPTIVE STATISTICS
(Summarizing and describing the important features of the collected data by graphical and
numerical methods)
A. GRAPHICAL (VISUAL) SUMMARY MEASURES OF DATA SETS--

PICTORIAL AND TABULAR METHODS: (Stem-and-Leaf Displays, Dot
Plots, Histograms, Scatter Diagrams, Bar Graphs, Pie Charts, Frequency Tables, Tally
Sheets)
(a) Stem-and-Leaf Displays: is a good way to obtain informative visual display of a data set x1,
x2, xn, where each number xi consists of at least two digits.
To construct a stem-and-leaf display, each number is divided into two parts:

(i) Stem: consist of one or more of the leading digits
(ii) Leaf: consist of remaining digits
A stem-and-display conveys information about the following aspects of the data:
Identification of a typical or representative value
Extent of spread about the typical value
Presence of any gaps in the data
Extent of symmetry in the distribution of values
Number and location of peaks
Presence of any outlying values.
Note: A stem-and-leaf display does not show the order in which observations were obtained.
Example 1: Suppose IE 256 Final Exam Scores (Sp10)
32 40 47 52 55 56 56 58 59 60 63 66 67 68 69 70 70 72 72 72 73
74 74 77 77 78 80 81 82 83 83 83 84 85 87 90 91
Stem-and-leaf of the Final Exam Scores N=37
Leaf Unit = 1.0
1 3 2
1 3
2 4 0
3 4 7
4 5 2
9 5 56689
11 6 03
15 6 6789
(8) 7 00222344
14 7 778
11 8 0123334
4 8 57
2 9 01
IE 256/B.U./Ekşioğlu 1
Example 2:
Compressive Strength of 80 Aluminum-Lithium Alloy Specimens
Stem-and-Leaf Display for the compressive strength data.
(b) Dotplots
Dot diagram allows us to see easily two features of the data:

 Location (or the middle)
 Variability (or the scatter)
Example 1: Consider the final exam scores from the preceding example.
Dot plot of final exam scores.
32 42 52 62 72 82 92
Final Exam Scores
Example 2:
Suppose an engineer is designing a nylon connector to be used in an automotive engine
application. The engineer is considering establishing the design specification on wall thickness
at 3/32 inch but is somewhat uncertain about the effect of this decision on the connector pull-off
force. If the pull-off force is too low, the connector may fail when it is installed in an engine.
Eight prototype units are produced and their pull-off forces measured, resulting in the following
data (in foot-pounds):
pull-off forces measured: 12.6 12.9 13.4 12.3 13.6 13.5 12.6 13.1
12.5 13.0 13.5

Pull-off force
Dot diagram of the pull-off force data when wall thickness is 3/32 inch.
Not: When the number of observations is small, it is usually difficult to identify any specific
pattern in the variability.
(c) Histograms: A histogram, basically, is a frequency distribution shown in a graphical form.
To draw a histogram:
 For continuous data: Equal class widths
Use the horizontal axis to represent the measurement scale
Subdivide the measurement axis into suitable number of intervals (classes).
Rule of thumb: Number of classes = (number of observations) 1/2
The vertical axis represents the frequency (or relative frequency) scale.
Relative frequencies are found by dividing the observed frequency in each class
(interval) by the total number of observations.
Relative frequency of a class = number of observations in the class / number of observations
in the data set
the data
Above
set each class interval draw a rectangle whose height is corresponding
frequency (or alternatively, relative frequency).
 For continuous data: Unequal class widths
Determine the number of intervals (classes): Use a few wider intervals near
extreme observation and narrower intervals in the region of high concentration
Determine frequencies and relative frequencies for each class.
Calculate the height of each rectangle by:
Rectangle height = density = relative frequency of the class / class width (rectangle width)
Not: In determining the intervals, make sure that none of the observations lies on a
class boundary. For this purpose, use either < symbols or add a hundredths digit to the
class boundaries to prevent observations from falling on the boundaries.
 For discrete data:
Determine the frequency and relative frequency of each x value.
Relative frequency of a value = number of times the value occurs / number of observations
in the data set
the data
Then
set mark possible x values on a horizontal scale.
Above each value draw a rectangle whose height is corresponding frequency (or
alternatively, relative frequency) of that value.
 For categorical data:
Determine the frequency and relative frequency of each category.
Mark the different categories with equal intervals on a horizontal scale.
Above each value draw a rectangle whose height is corresponding frequency (or
alternatively, relative frequency) of that category.
Cumulative frequency/relative frequency histogram: In this plot, the height of each

rectangle is the total number of observations (or relative frequencies) that are less
than or equal to the upper limit of the rectangle.
Example:
Compressive Strength of 80 Aluminum-Lithium Alloy Specimens (n=80)
Example for Frequency Distribution and Histogram
Frequency distribution for the compressive strength data.
A histogram of the compressive strength data from Minitab with
9 bins (rectangles).
A histogram of the compressive strength data from Minitab with
17 bins (rectangles).
Histogram of compressive strength for 80 aluminum-lithium alloy specimens.
A cumulative frequency histogram of strength data from MINITAB (for 17
intervals).
Histogram Shapes
PARETO HISTOGRAM (Chart)
Airplane production in 1985 (Source: Boeing Company).
SCATTER DIAGRAM (Plot)
What it is:
A scatter diagram is a tool for analyzing relationships between two variables. One variable is plotted on the horizontal axis and
the other is plotted on the vertical axis. The pattern of their intersecting points can graphically show relationship patterns. Most often
a scatter diagram is used to prove or disprove cause-and-effect relationships. While the diagram shows relationships, it does not by
itself prove that one variable causes the other. In addition to showing possible cause-and-effect relationships, a scatter diagram can
show that two variables are from a common cause that is unknown or that one variable can be used as a surrogate for the other.
When to use it (examples):
1. Use a scatter diagram to examine theories about cause-and-effect relationships and to search for root causes of an identified
problem.
2. Use a scatter diagram to design a control system to ensure that gains from quality improvement efforts are maintained
40 NGS = 0,537 + 0,9338*DGS
N-Dominant Maximum Grip Strength (kgf)

R-Sq 76,7%
35 R-Sq(adj) 76,4%
30
25
20
15
10
15 20 25 30 35 40
Dominant Maximum Grip Strength (kgf)
Female dominant (DGS) and non-dominant (NGS) maximum grip strength relationship
(Source: Ekşioğlu and Baykar, 2009)
PIE CHART
Regional distribution of the family origin of the participants across Turkey (Source: Ekşioğlu and
Baykar, 2009)
BAR CHART
Source: Cherng, Ekşioğlu and Kızılaslan: Vibration reduction of pneumatic percussive rivet tools: Mechanical and
ergonomic re-design approaches. Applied Ergonomics 40 (2009) 256– 266
B. NUMERICAL SUMMARY MEASURES OF DATA SETS: Measures of
Location and Measures of Variability (Mean, Median, Trimmed Mean,
Variance, Standard Deviation, Range, Quartiles, Sample Proportions)
(a) MEASURES OF LOCATION OR CENTRAL TENDENCY
1. Population and Sample Means
Population mean: = xi / N, where i=1 …N
N= finite number of observation in the population
Sample mean: x = xi / n, where i= 1….n
n= number of observations in the sample (sample size)
pull-off forces measured: 12.6 12.9 13.4 12.3 13.6 13.5 12.6 13.1
12.5 13.0 13.5
Pull-off force
The sample mean as a balance point for a system of weights
2. Trimmed Mean: It is obtained by eliminating some observations from both the smallest and
largest end of the data in equal proportion, then averaging what is left over.
3. Population and Sample Medians: Divides the data into two equal parts, half below the
median and half above.
~
Calculating Sample median: x
Step 1: Order the observations from smallest to largest (include any repeated
values):
Step 2: calculate the median as follows:
~
when n is odd x = [(n+1)/2]th ordered value = The single middle value
~
when n is even x = [(n/2)th and (n/2 + 1)th ordered values] / 2
= Average of the two middle values
~
Population median: (calculated in the same manner as the sample median by
substituting n with N)
Three Different Shapes for a Population Distribution
4. Mode: Most frequently occurring data value
5. Quartiles: Dividing the data into four (4) equal parts.
The first or lower quartile, Q1: is a value that has approximately one-fourth (25%) of the
observations below it and approximately 75% of observations above.
The second quartile, Q2: it has approximately one-half (50%) of the observations below its
value = median value
The third quartile, Q3: It has approximately three-fourths (75%) of the observations below its
value.
6. Percentiles: Similar to quartile calculation, a data set (sample or population) can be
divided into 100 equal parts using percentiles. For example, the 90th percentile
separates the highest 10% from the bottom 90%, and so on (will be covered in Ch. 4
in more detail).
7. Categorical Data and Sample Proportions
When data is categorical (qualitative)
Example: Consider the cars in the BU parking lot:
Let n = sample number of cars selected in the lot.

Let’s categorize cars with type of transmission they have:
Category A= cars with manual transmission = x
Category B: Cars with automatic transmission = n-x
Lets consider a sample of cars in the lot:
Then sample proportions are:
proportion of manual cars = x/n
proportion of automatic cars = (n-x)/n = 1- (x/n)
Similarly, p = population proportion (p represents the proportion of individuals in the
entire population). x/n is used to estimate p. 0 x/n 1 and 0 p 1.
(b) MEASURES OF VARIABILITY
8. Population and sample variance and standard deviation: Variance is a measure of the
spread of the data.
i=N
Population variance: = (xi – )2 / N
2
i=1
N= finite number of observation in the population
Standard deviation: it is a measure of the spread of the data using the same units as the
data.
Population standard deviation: = ( 2)1/2 (positive square root)
The sample variance of the sample x1, x2, …xn of n values of X is
given by
i=n
Sample variance: s = (xi – x )2 / n-1
2
i=1
n= number of observations in the sample (sample size)
An alternative expression for the numerator of s2 is:
Complete it!
Sample standard deviation: s = (s2)1/2 (positive square root)
Properties of s2
Let x1, x2,…,xn be any sample and c be any nonzero constant.
1. If y1=x1+c, y2=x2+c, …, yn=xn+c  sx2 =sy2

That means; if a constant c added to (or subtracted) from each data value (xi),
the variance is unchanged.
2. If y1=cx1, y2=cx2, …. , yn=cxn  sy2 = c2sx2 sy = c sx

That means; multiplication of each data points (xi) by c results in s2 being
multiplied by a factor of c2.
Prove both property!
9. Box Plots
Description of a boxplot
After the n observations in a data set are ordered from smallest to largest, the lower (upper) fourth is the median of the smallest
(largest) half of the data, where the median is included in both halves if n is odd. A measure of the spread that is resistant to outliers
is the fourth spread fs = upper fourth – lower fourth.
Outliers: Any observation farther than 1.5fs from the closest fourth is an outlier. An outlier is extreme if it is more than 3fs from the
nearest fourth, and it is mild otherwise.
Boxplot for the compressive strength data.
Boxplots for Fabs and Fnet forces: Subjects categorized by gender (source: Eksioglu, M.,
Kızılaslan, K., Steering-wheel grip force characteristics of drivers as a function of gender, speed,
and road condition. International Journal of Industrial Ergonomics, 2008).
40
35
Maximum Grip Strength (kgf) 30
25
20
15
18-29 30-39 40-49 50-59 60-69

Age-Group (yrs)
Boxplots of maximum grip strength of Turkish females stratified by age groups (Source: Ekşioğlu and Baykar,
2009)
70
60
Maximum Grip Strength (kgf)
50
40
30
20
10
F M
Gender
Boxplots of grip strength stratified by gender for NME and students (18-69 yrs) (Source: Ekşioğlu and Baykar,
2009)
10. Time Series (Sequence) Data Plots: A time series is a sequence of observations which are
ordered in time (or space). If observations are made on some phenomenon throughout time, it
is most sensible to display the data in the order in which they arose, particularly since
successive observations will probably be dependent. Time series are best displayed in a scatter
plot. The series value X is plotted on the vertical axis and time t on the horizontal axis. Time is
called the independent variable (in this case however, something over which you have little
control). There are two kinds of time series data:
1. Continuous, where we have an observation at every instant of time, e.g. lie detectors,
electrocardiograms. We denote this using observation X at time t, X(t).
2. Discrete, where we have an observation at (usually regularly) spaced intervals. We

denote this as Xt.
Examples
Economics - weekly share prices, monthly profits
Meteorology - daily rainfall, wind speed, temperature
Sociology - crime figures (number of arrests, etc), employment figures

A CH1sp11

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A CH1sp11

Enviado por

Direitos autorais:

Formatos disponíveis

IE 256 - CH 1

A. GRAPHICAL (VISUAL) SUMMARY MEASURES OF DATA SETS--

To construct a stem-and-leaf display, each number is divided into two parts:

Example 1: Suppose IE 256 Final Exam Scores (Sp10)

Compressive Strength of 80 Aluminum-Lithium Alloy Specimens

Dot diagram allows us to see easily two features of the data:

Dot plot of final exam scores.

12.5 13.0 13.5

Rule of thumb: Number of classes = (number of observations) 1/2

Cumulative frequency/relative frequency histogram: In this plot, the height of each

Frequency distribution for the compressive strength data.

Airplane production in 1985 (Source: Boeing Company).

40 NGS = 0,537 + 0,9338*DGS

N-Dominant Maximum Grip Strength (kgf)

(a) MEASURES OF LOCATION OR CENTRAL TENDENCY

1. Population and Sample Means

Population mean: = xi / N, where i=1 …N

N= finite number of observation in the population

Sample mean: x = xi / n, where i= 1….n

n= number of observations in the sample (sample size)

12.5 13.0 13.5

The sample mean as a balance point for a system of weights

Three Different Shapes for a Population Distribution

4. Mode: Most frequently occurring data value

5. Quartiles: Dividing the data into four (4) equal parts.

7. Categorical Data and Sample Proportions

When data is categorical (qualitative)

Example: Consider the cars in the BU parking lot:

Let n = sample number of cars selected in the lot.

(b) MEASURES OF VARIABILITY

n= number of observations in the sample (sample size)

An alternative expression for the numerator of s2 is:

Sample standard deviation: s = (s2)1/2 (positive square root)

Let x1, x2,…,xn be any sample and c be any nonzero constant.

1. If y1=x1+c, y2=x2+c, …, yn=xn+c  sx2 =sy2

2. If y1=cx1, y2=cx2, …. , yn=cxn  sy2 = c2sx2 sy = c sx

Prove both property!

Maximum Grip Strength (kgf) 30

18-29 30-39 40-49 50-59 60-69

2. Discrete, where we have an observation at (usually regularly) spaced intervals. We

Você também pode gostar