Você está na página 1de 26

INTRODUCTION TO

STATISTICS AND
STATISTICAL
INFERENCE

Summary Measures

Location Variation Skewness

Percentile Kurtosis
Maximum Quartile
Range
Decile
Minimum Coefficient of
Median
Variance Variation
Central Interquartile
Tendency Range

Standard Deviation
Mean Median Mode
Measures of Central Tendency
 A single value that is used to identify the center
of the data
 it is thought of as a typical value of the
distribution
 precise yet simple
 most representative value of the data

Mean
 Most common measure of the center
 Also known as arithmetic average
N

X
i =1
i
X1 + X 2 + K + X N
Population Mean: = =
N N
n

x
i =1
i
x1 + x2 + K + xn
x = =
Sample Mean: n n
Properties of the Mean

 may not be an actual


observation in the data set
 can be applied in at least
interval level
 easy to compute
 every observation contributes
to the value of the mean

Properties of the Mean


 subgroup means can be combined to come up
with a group mean (use weighted mean)

 easily affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5
Mean = 6
Median
 Divides the observations into two equal
parts
 If the number of observations is odd, the
median is the middle number.
 If the number of observations is even, the
median is the average of the 2 middle
numbers.
~
 Sample median denoted as x
~
while population median is denoted as

Properties of a Median
 may not be an actual observation in
the data set
 can be applied in at least ordinal level
 a positional measure; not affected by
extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5
Mode
 occurs most frequently
 nominal average
 may or may not exist

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No Mode
Mode = 9

Properties of a Mode

 can be used for qualitative as


well as quantitative data
 may not be unique
 not affected by extreme values
 can be computed for
ungrouped and grouped data
Mean, Median & Mode
Use the mean when:
 sampling stability is desired
 other measures are to be computed

Mean, Median & Mode


Use the median when:
 the exact midpoint of the distribution is
desired
 there are extreme observations
Mean, Median & Mode
Use the mode when:
 when the "typical" value is desired
 when the dataset is measured on a
nominal scale

Measures of Location
 A Measure of Location summarizes a data set
by giving a value within the range of the data
values that describes its location relative to the
entire data set arranged according to magnitude
(called an array).

SomeCommon Measures:
 Minimum, Maximum
 Percentiles, Deciles, Quartiles
Maximum and Minimum
 Minimum is the smallest value in the
data set, denoted as MIN.

 Maximum is the largest value in the


data set, denoted as MAX.

Percentiles
 Numerical measures that give the
relative position of a data value
relative to the entire data set.
 Divide an array (raw data arranged
in increasing or decreasing order
of magnitude) into 100 equal parts.
 The jth percentile, denoted as Pj, is
the data value in the the data set
that separates the bottom j% of the
data from the top (100-j)%.
EXAMPLE
Suppose LJ was told that relative to the
other scores on a certain test, his score
was the 95th percentile.
 This means that (at least) 95% of those
who took the test had scores less than or
equal to LJs score, while (at least) 5%
had scores higher than LJs.

Deciles
 Divide an array into ten equal
parts, each part having ten
percent of the distribution of
the data values, denoted by Dj.

 The 1st decile is the 10th


percentile; the 2nd decile is the
20th percentile..
Quartiles
 Divide an array into four equal parts,
each part having 25% of the distribution
of the data values, denoted by Qj.
 The 1st quartile is the 25th percentile;
the 2nd quartile is the 50th percentile,
also the median and the 3rd quartile is
the 75th percentile.

Measures of Variation
 A measure of variation is a single
value that is used to describe the
spread of the distribution
A measure of central tendency alone
does not uniquely describe a distribution
A look at dispersion

Data A

Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
s = 3.338

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258

Data C Mean = 15.5


s = 4.57
11 12 13 14 15 16 17 18 19 20 21

Two Types of Measures of


Dispersion
Absolute Measures of Dispersion:
 Range
 Inter-quartile Range
 Variance
 Standard Deviation
Relative Measure of Dispersion:
 Coefficient of Variation
Range (R)
The difference between the maximum and
minimum value in a data set, i.e.
R = MAX MIN
Example: Pulse rates of 15 male residents of a
certain village
54 58 58 60 62 65 66 71
74 75 77 78 80 82 85

R = 85 - 54 = 31

Some Properties of the Range


 The larger the value of the range,
the more dispersed the
observations are.
 It is quick and easy to understand.
 A rough measure of dispersion.
Inter-Quartile Range (IQR)
The difference between the third quartile and
first quartile, i.e.
IQR = Q3 Q1
Example: Pulse rates of 15 residents of a
certain village

54 58 58 60 62 65 66 71
74 75 77 78 80 82 85

IQR = 78 - 60 = 18

Some Properties of IQR

 Reduces the influence of


extreme values.

 Not as easy to calculate as the


Range.
Variance
 important measure of variation
 shows variation about the mean
N

(X i )2
Population variance 2
= i =1
N

Sample variance 2
(x
i =1
i x )2
s =
n 1

Standard Deviation (SD)


 most important measure of variation
 square root of Variance
 has the same units as the original data
N

(X
i =1
i )2
=
Population SD N

(x i x)2
Sample SD s= i =1

n 1
Computation of Standard Deviation

(Sample) Data: 10 12 14 15 17 18 18 24

n=8 Mean =16

(10 16) 2 + (12 16) 2 + (14 16)2 + (15 16) 2 + (17 16) 2 + (18 16) 2 + ( 24 16) 2
s=
7
= 4.309

Remarks on Standard Deviation


 If there is a large amount of variation,
then on average, the data values will be
far from the mean. Hence, the SD will be
large.
 If there is only a small amount of
variation, then on average, the data
values will be close to the mean. Hence,
the SD will be small.
Comparing Standard Deviations
(comparable only when units of measure are the same and
the means are not too different from each other)
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57

Comparing Standard Deviations


Example: Team A - Heights of five marathon players in inches

Mean = 65
S =0

65

65 65 65 65 65
Comparing Standard Deviation
Example: Team B - Heights of five marathon players in inches

Mean = 65
s = 4.0

62 67 66 70 60

Properties of Standard Deviation

 It is the most widely used measure of


dispersion. (Chebychevs Inequality)
 It is based on all the items and is rigidly
defined.
 It is used to test the reliability of measures
calculated from samples.
 The standard deviation is sensitive to the
presence of extreme values.
 It is not easy to calculate by hand (unlike the
range).
Chebyshevs Rule
It permits us to make statements about
the percentage of observations that
must be within a specified number of
standard deviation from the mean
The proportion of any distribution that
lies within k standard deviations of the
mean is at least 1-(1/k2) where k is
any positive number larger than 1.
This rule applies to any distribution.

Chebyshevs Rule

For any data set with mean () and


standard deviation (SD), the following
statements apply:
At least 75% of the observations are
within 2SD of its mean.

At least 88.9% of the observations are


within 3SD of its mean.
Illustration

At least 75%

At least 75% of the observations


are within 2SD of its mean.

Example
The midterm exam scores of 100 STAT 1 students
last semester had a mean of 65 and a standard
deviation of 8 points.
Applying the Chebyshevs Rule, we can say that:
1. At least 75% of the students had scores
between 49 and 81.
2. At least 88.9% of the students had scores
between 41 and 89.
Coefficient of Variation (CV)
 measure of relative variation
 usually expressed in percent
 shows variation relative to mean
 used to compare 2 or more groups
 Formula :
SD
CV = 100%
Mean

Comparing CVs
 Stock A: Average Price = P50
SD = P5
CV = 10%
 Stock B: Average Price = P100
SD = P5
CV = 5%
Measure of Skewness
 Describes the degree of departures of the
distribution of the data from symmetry.
 The degree of skewness is measured by
the coefficient of skewness, denoted as SK
and computed as,

3(Mean Median)
SK =
SD

What is Symmetry?

A distribution is said to be
symmetric about the mean,
if the distribution to the left
of mean is the mirror
image of the distribution to
the right of the mean.
Likewise, a symmetric
distribution has SK=0 since
its mean is equal to its
median and its mode.
Measure of Skewness

SK > 0
positively
skewed

SK < 0
negatively skewed

Measure of Kurtosis
 Describes the extent of peakedness or
flatness of the distribution of the data.
 Measured by coefficient of kurtosis (K)
computed as,
N

(X
4
i
)
K = i =1
4
3
N
Measure of Kurtosis

K=0
mesokurtic

K>0 K<0
leptokurtic platykurtic

Box-and-Whiskers Plot

 Concerned with the symmetry of the


distribution and incorporates
measures of location in order to study
the variability of the observations.
 Also called as box plot or 5-number
summary (represented by Min, Max,
Q1, Q2, and Q3).
 Suitable for identifying outliers.
Box-and-Whiskers Plot
The diagram is made up of a box which lies
between the first and third quartiles.
The whiskers are the straight lines extending from
the ends of the box to the smallest and largest
values that are not outliers.

Steps to Construct a Box-and-Whiskers plot

Step 1: Draw a rectangular box whose left edge is at the


Q1 and whose right edge is at the Q3 so the box width
is the IQR. Then draw a vertical line segment inside
the box where the median is found.

Q1 Md Q3

75 78 85
Steps to Construct a Box-and-Whiskers plot

Step 2: Place marks at distances 1.5 IQR from


either end of the box. (1.5 IQR =15)
1.5 IQR 1.5 IQR

Q1 Md Q3

60 75 78 85 100

Steps to Construct a Box-and-Whiskers plot

Step 3:Draw the horizontal line


segments known as the whiskers
from each of the end box to the
largest and smallest values in the data
set that are not outliers.
(An observation beyond 1.5 IQR is
an outlier.)
Steps to Construct a Box-and-Whiskers plot

Step 4: For every outlier, draw a dot. If two or more dots


have the same values, draw the dots side by side.
1.5 IQR 1.5 IQR

.
.
Q1 Md Q3

55 60 75 78 85 98 100

Você também pode gostar