Você está na página 1de 25

IE 256 - CH 1

DESCRIPTIVE STATISTICS
(Summarizing and describing the important features of the collected data by graphical and
numerical methods)

A. GRAPHICAL (VISUAL) SUMMARY MEASURES OF DATA SETS--


PICTORIAL AND TABULAR METHODS: (Stem-and-Leaf Displays, Dot
Plots, Histograms, Scatter Diagrams, Bar Graphs, Pie Charts, Frequency Tables, Tally
Sheets)

(a) Stem-and-Leaf Displays: is a good way to obtain informative visual display of a data set x1,
x2, xn, where each number xi consists of at least two digits.

To construct a stem-and-leaf display, each number is divided into two parts:


(i) Stem: consist of one or more of the leading digits
(ii) Leaf: consist of remaining digits
A stem-and-display conveys information about the following aspects of the data:
Identification of a typical or representative value
Extent of spread about the typical value
Presence of any gaps in the data
Extent of symmetry in the distribution of values
Number and location of peaks
Presence of any outlying values.

Note: A stem-and-leaf display does not show the order in which observations were obtained.

Example 1: Suppose IE 256 Final Exam Scores (Sp10)

32 40 47 52 55 56 56 58 59 60 63 66 67 68 69 70 70 72 72 72 73
74 74 77 77 78 80 81 82 83 83 83 84 85 87 90 91
Stem-and-leaf of the Final Exam Scores N=37
Leaf Unit = 1.0
1 3 2
1 3
2 4 0
3 4 7
4 5 2
9 5 56689
11 6 03
15 6 6789
(8) 7 00222344
14 7 778
11 8 0123334
4 8 57
2 9 01
IE 256/B.U./Ekşioğlu 1
Example 2:

Compressive Strength of 80 Aluminum-Lithium Alloy Specimens

IE 256/B.U./Ekşioğlu 2
Stem-and-Leaf Display for the compressive strength data.

IE 256/B.U./Ekşioğlu 3
(b) Dotplots

Dot diagram allows us to see easily two features of the data:


 Location (or the middle)
 Variability (or the scatter)

Example 1: Consider the final exam scores from the preceding example.

Dot plot of final exam scores.

32 42 52 62 72 82 92
Final Exam Scores

IE 256/B.U./Ekşioğlu 4
Example 2:
Suppose an engineer is designing a nylon connector to be used in an automotive engine
application. The engineer is considering establishing the design specification on wall thickness
at 3/32 inch but is somewhat uncertain about the effect of this decision on the connector pull-off
force. If the pull-off force is too low, the connector may fail when it is installed in an engine.
Eight prototype units are produced and their pull-off forces measured, resulting in the following
data (in foot-pounds):

pull-off forces measured: 12.6 12.9 13.4 12.3 13.6 13.5 12.6 13.1

12.5 13.0 13.5


Pull-off force

Dot diagram of the pull-off force data when wall thickness is 3/32 inch.

Not: When the number of observations is small, it is usually difficult to identify any specific
pattern in the variability.

IE 256/B.U./Ekşioğlu 5
(c) Histograms: A histogram, basically, is a frequency distribution shown in a graphical form.
To draw a histogram:
 For continuous data: Equal class widths
Use the horizontal axis to represent the measurement scale
Subdivide the measurement axis into suitable number of intervals (classes).

Rule of thumb: Number of classes = (number of observations) 1/2

The vertical axis represents the frequency (or relative frequency) scale.
Relative frequencies are found by dividing the observed frequency in each class
(interval) by the total number of observations.
Relative frequency of a class = number of observations in the class / number of observations
in the data set
the data
Above
set each class interval draw a rectangle whose height is corresponding
frequency (or alternatively, relative frequency).
 For continuous data: Unequal class widths
Determine the number of intervals (classes): Use a few wider intervals near
extreme observation and narrower intervals in the region of high concentration
Determine frequencies and relative frequencies for each class.
Calculate the height of each rectangle by:

Rectangle height = density = relative frequency of the class / class width (rectangle width)

Not: In determining the intervals, make sure that none of the observations lies on a
class boundary. For this purpose, use either < symbols or add a hundredths digit to the
class boundaries to prevent observations from falling on the boundaries.
 For discrete data:
Determine the frequency and relative frequency of each x value.
Relative frequency of a value = number of times the value occurs / number of observations
in the data set
the data
Then
set mark possible x values on a horizontal scale.
Above each value draw a rectangle whose height is corresponding frequency (or
alternatively, relative frequency) of that value.
 For categorical data:
Determine the frequency and relative frequency of each category.
Mark the different categories with equal intervals on a horizontal scale.
Above each value draw a rectangle whose height is corresponding frequency (or
alternatively, relative frequency) of that category.

Cumulative frequency/relative frequency histogram: In this plot, the height of each


rectangle is the total number of observations (or relative frequencies) that are less
than or equal to the upper limit of the rectangle.

IE 256/B.U./Ekşioğlu 6
Example:
Compressive Strength of 80 Aluminum-Lithium Alloy Specimens (n=80)

IE 256/B.U./Ekşioğlu 7
Example for Frequency Distribution and Histogram

Frequency distribution for the compressive strength data.

IE 256/B.U./Ekşioğlu 8
A histogram of the compressive strength data from Minitab with
9 bins (rectangles).

IE 256/B.U./Ekşioğlu 9
A histogram of the compressive strength data from Minitab with
17 bins (rectangles).

IE 256/B.U./Ekşioğlu 10
Histogram of compressive strength for 80 aluminum-lithium alloy specimens.
IE 256/B.U./Ekşioğlu 11
A cumulative frequency histogram of strength data from MINITAB (for 17
intervals).

IE 256/B.U./Ekşioğlu 13
Histogram Shapes

IE 256/B.U./Ekşioğlu 14
PARETO HISTOGRAM (Chart)

Airplane production in 1985 (Source: Boeing Company).

IE 256/B.U./Ekşioğlu 15
SCATTER DIAGRAM (Plot)
What it is:
A scatter diagram is a tool for analyzing relationships between two variables. One variable is plotted on the horizontal axis and
the other is plotted on the vertical axis. The pattern of their intersecting points can graphically show relationship patterns. Most often
a scatter diagram is used to prove or disprove cause-and-effect relationships. While the diagram shows relationships, it does not by
itself prove that one variable causes the other. In addition to showing possible cause-and-effect relationships, a scatter diagram can
show that two variables are from a common cause that is unknown or that one variable can be used as a surrogate for the other.
When to use it (examples):
1. Use a scatter diagram to examine theories about cause-and-effect relationships and to search for root causes of an identified
problem.
2. Use a scatter diagram to design a control system to ensure that gains from quality improvement efforts are maintained

40 NGS = 0,537 + 0,9338*DGS

N-Dominant Maximum Grip Strength (kgf)


R-Sq 76,7%
35 R-Sq(adj) 76,4%

30

25

20

15

10
15 20 25 30 35 40
Dominant Maximum Grip Strength (kgf)

Female dominant (DGS) and non-dominant (NGS) maximum grip strength relationship
(Source: Ekşioğlu and Baykar, 2009)
IE 256/B.U./Ekşioğlu 16
PIE CHART

Regional distribution of the family origin of the participants across Turkey (Source: Ekşioğlu and
Baykar, 2009)

BAR CHART

Source: Cherng, Ekşioğlu and Kızılaslan: Vibration reduction of pneumatic percussive rivet tools: Mechanical and
ergonomic re-design approaches. Applied Ergonomics 40 (2009) 256– 266

IE 256/B.U./Ekşioğlu 17
B. NUMERICAL SUMMARY MEASURES OF DATA SETS: Measures of
Location and Measures of Variability (Mean, Median, Trimmed Mean,
Variance, Standard Deviation, Range, Quartiles, Sample Proportions)

(a) MEASURES OF LOCATION OR CENTRAL TENDENCY

1. Population and Sample Means

Population mean: = xi / N, where i=1 …N

N= finite number of observation in the population

Sample mean: x = xi / n, where i= 1….n

n= number of observations in the sample (sample size)

pull-off forces measured: 12.6 12.9 13.4 12.3 13.6 13.5 12.6 13.1

12.5 13.0 13.5

Pull-off force

The sample mean as a balance point for a system of weights

2. Trimmed Mean: It is obtained by eliminating some observations from both the smallest and
largest end of the data in equal proportion, then averaging what is left over.

3. Population and Sample Medians: Divides the data into two equal parts, half below the
median and half above.
~
Calculating Sample median: x

Step 1: Order the observations from smallest to largest (include any repeated
values):
Step 2: calculate the median as follows:

IE 256/B.U./Ekşioğlu 18
~
when n is odd x = [(n+1)/2]th ordered value = The single middle value
~
when n is even x = [(n/2)th and (n/2 + 1)th ordered values] / 2
= Average of the two middle values
~
Population median: (calculated in the same manner as the sample median by
substituting n with N)

Three Different Shapes for a Population Distribution

4. Mode: Most frequently occurring data value

5. Quartiles: Dividing the data into four (4) equal parts.

The first or lower quartile, Q1: is a value that has approximately one-fourth (25%) of the
observations below it and approximately 75% of observations above.

The second quartile, Q2: it has approximately one-half (50%) of the observations below its
value = median value

The third quartile, Q3: It has approximately three-fourths (75%) of the observations below its
value.
6. Percentiles: Similar to quartile calculation, a data set (sample or population) can be

IE 256/B.U./Ekşioğlu 19
divided into 100 equal parts using percentiles. For example, the 90th percentile
separates the highest 10% from the bottom 90%, and so on (will be covered in Ch. 4
in more detail).

7. Categorical Data and Sample Proportions

When data is categorical (qualitative)

Example: Consider the cars in the BU parking lot:

Let n = sample number of cars selected in the lot.


Let’s categorize cars with type of transmission they have:
Category A= cars with manual transmission = x
Category B: Cars with automatic transmission = n-x
Lets consider a sample of cars in the lot:
Then sample proportions are:
proportion of manual cars = x/n
proportion of automatic cars = (n-x)/n = 1- (x/n)
Similarly, p = population proportion (p represents the proportion of individuals in the
entire population). x/n is used to estimate p. 0 x/n 1 and 0 p 1.

(b) MEASURES OF VARIABILITY

8. Population and sample variance and standard deviation: Variance is a measure of the
spread of the data.
i=N
Population variance: = (xi – )2 / N
2

i=1
N= finite number of observation in the population

Standard deviation: it is a measure of the spread of the data using the same units as the
data.
Population standard deviation: = ( 2)1/2 (positive square root)
The sample variance of the sample x1, x2, …xn of n values of X is
given by
i=n
Sample variance: s = (xi – x )2 / n-1
2

i=1

n= number of observations in the sample (sample size)

An alternative expression for the numerator of s2 is:

Complete it!

Sample standard deviation: s = (s2)1/2 (positive square root)

IE 256/B.U./Ekşioğlu 20
Properties of s2

Let x1, x2,…,xn be any sample and c be any nonzero constant.

1. If y1=x1+c, y2=x2+c, …, yn=xn+c  sx2 =sy2


That means; if a constant c added to (or subtracted) from each data value (xi),
the variance is unchanged.

2. If y1=cx1, y2=cx2, …. , yn=cxn  sy2 = c2sx2 sy = c sx


That means; multiplication of each data points (xi) by c results in s2 being
multiplied by a factor of c2.

Prove both property!

IE 256/B.U./Ekşioğlu 21
9. Box Plots

Description of a boxplot

After the n observations in a data set are ordered from smallest to largest, the lower (upper) fourth is the median of the smallest
(largest) half of the data, where the median is included in both halves if n is odd. A measure of the spread that is resistant to outliers
is the fourth spread fs = upper fourth – lower fourth.

Outliers: Any observation farther than 1.5fs from the closest fourth is an outlier. An outlier is extreme if it is more than 3fs from the
nearest fourth, and it is mild otherwise.

IE 256/B.U./Ekşioğlu 22
Boxplot for the compressive strength data.

Boxplots for Fabs and Fnet forces: Subjects categorized by gender (source: Eksioglu, M.,
Kızılaslan, K., Steering-wheel grip force characteristics of drivers as a function of gender, speed,
and road condition. International Journal of Industrial Ergonomics, 2008).

IE 256/B.U./Ekşioğlu 23
40

35

Maximum Grip Strength (kgf) 30

25

20

15

18-29 30-39 40-49 50-59 60-69


Age-Group (yrs)

Boxplots of maximum grip strength of Turkish females stratified by age groups (Source: Ekşioğlu and Baykar,
2009)

70

60
Maximum Grip Strength (kgf)

50

40

30

20

10
F M
Gender

Boxplots of grip strength stratified by gender for NME and students (18-69 yrs) (Source: Ekşioğlu and Baykar,
2009)

IE 256/B.U./Ekşioğlu 24
10. Time Series (Sequence) Data Plots: A time series is a sequence of observations which are
ordered in time (or space). If observations are made on some phenomenon throughout time, it
is most sensible to display the data in the order in which they arose, particularly since
successive observations will probably be dependent. Time series are best displayed in a scatter
plot. The series value X is plotted on the vertical axis and time t on the horizontal axis. Time is
called the independent variable (in this case however, something over which you have little
control). There are two kinds of time series data:

1. Continuous, where we have an observation at every instant of time, e.g. lie detectors,
electrocardiograms. We denote this using observation X at time t, X(t).

2. Discrete, where we have an observation at (usually regularly) spaced intervals. We


denote this as Xt.

Examples
Economics - weekly share prices, monthly profits
Meteorology - daily rainfall, wind speed, temperature
Sociology - crime figures (number of arrests, etc), employment figures

IE 256/B.U./Ekşioğlu 25

Você também pode gostar