Você está na página 1de 60

Statistika Ekonomi dan Bisnis

Agus Salim
Pertemuan Pertama
Pendahuluan dan Distribusi Frekuensi
Arti dan Kegunaan Statistika
Macam-macam Data: Data Kuantitaif dan Data
Kualitatif
Pengertian tentang Populasi dan Sampel
Ukuran-ukuran Sentral dan Persebaran
Nilai sentral secara ringkas
Deviasi Standar Data
Koefisien Variasi Data
Perhitungan Kuartil dan Persentil
Sebelum Memulai
Pilih Ketua Kelas
Buat email kelas
Metode kuliah yang akan dipakai : SCL
Pembagian kelompok
Tugas kelompok


Arti dan Kegunaan Statistka
Apa Statistika itu?
Statistics is the science of collecting, organizing,
presenting, analyzing, and interpreting numerical data
to assist in making more effective decisions.
Apa Kegunaan Statistika?
Statistical techniques are used extensively by
marketing, accounting, quality control, consumers,
professional sports people, hospital administrators,
educators, politicians, physicians, etc..
A. Qualitative or Attribute Data (variable) - the
characteristic being studied is nonnumeric.
EXAMPLES: Gender, religious affiliation, type of
automobile owned, state of birth, eye color are
examples.

B. Quantitative Data (variable) - information is reported
numerically.
EXAMPLES: balance in your checking account, minutes
remaining in class, or number of children in a family.

Macam-macam Data:
Data Kualitatif dan Data Kuantitaif
Summary of Types of Data
Type of Data
Pengertian tentang
Populasi dan Sampel
A population is a collection of all possible individuals, objects, or
measurements of interest.

A sample is a portion, or part, of the population of interest
The central tendency is the middle or
typical values of a distribution.
Central tendency can be assessed using a
dot plot, histogram or more precisely with
numerical statistics.
Central Tendency
Statistic Formula Excel Formula Pro Con
Mean =AVERAGE(Data)
Familiar and
uses all the
sample
information.
Influenced by
extreme
values.
1
1
n
i
i
x
n
=

Central Tendency
Six Measures of Central Tendency
Median
Middle
value in
sorted
array
=MEDIAN(Data)
Robust when
extreme data
values exist.
Ignores
extremes and
can be
affected by
gaps in data
values.
Statistic Formula Excel Formula Pro Con
Mode
Most
frequently
occurring
data value
=MODE(Data)
Useful for
attribute
data or
discrete data
with a small
range.
May not be
unique,
and is not
helpful for
continuous
data.
Central Tendency
Six Measures of Central Tendency
Midrange
=0.5*(MIN(Data)
+MAX(Data))
Easy to
understand
and
calculate.
Influenced
by extreme
values and
ignores
most data
values.
min max
2
x x +
Statistic Formula Excel Formula Pro Con
Geometric
mean (G)
=GEOMEAN(Data)
Useful for
growth
rates and
mitigates
high
extremes.
Less
familiar
and
requires
positive
data.
Trimmed
mean
Same as the
mean except
omit highest
and lowest
k% of data
values (e.g.,
5%)
=TRMEAN(Data, %)
Mitigates
effects of
extreme
values.
Excludes
some data
values
that could
be
relevant.
Central Tendency
Six Measures of Central Tendency
1 2
...
n
n
x x x
A familiar measure of central tendency.
In Excel, use function =AVERAGE(Data)
where Data is an array of data values.
Population Formula Sample Formula
1
N
i
i
x
N
=
=

1
n
i
i
x
x
n
=
=

Central Tendency
Mean
For the sample of n = 37 car brands:
1
87 93 98 ... 159 164 173 4639
125.38
37 37
n
i
i
x
x
n
=
+ + + + + +
= = = =

Central Tendency
Mean
Brand Defects Per 100
Lexus 87
Cadillac 93
Jaguar 98
Honda 99
Buick 100
Mercury 100
Hundai 102
Infiniti 104
Toyota 104
Mercedes-Benz 106
Audi 109
BMW 109
Oldsmobile 110
Volvo 113
Acura 117
Chevrolet 119
Chrysler 120
Dodge 121
Lincoln 121
Pontiac 122
Suburu 123
GMC 127
Ford 130
Mitsubishi 130
Saab 133
Jeep 136
Mini 142
Land Rover 148
Saturn 149
Suzuki 149
Kia 153
Nissan 154
Mazda 157
Scion 158
Porsche 159
Volkswagen 164
Hummer 173
Arithmetic mean is the most familiar average.
Affected by every sample item.
The balancing point or fulcrum for the data.
Central Tendency
Characteristics of the Mean
Regardless of the shape of the distribution, absolute
distances from the mean to the data points always sum
to zero.
1
( ) 0
n
i
i
x x
=
=

Central Tendency
Characteristics of the Mean
Consider the following
asymmetric distribution of quiz
scores whose mean = 65.
1
( )
n
i
i
x x
=

= (42 65) + (60 65) + (70 65) + (75 65) + (78 65)
= (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 0
The median (M) is the 50
th
percentile or midpoint
of the sorted sample data.
M separates the upper and lower half of the
sorted observations.
If n is odd, the median is the middle observation
in the data array.
If n is even, the median is the average of the
middle two observations in the data array.
Central Tendency
Median
Central Tendency
Median
For n = 8, the median is between the fourth and fifth
observations in the data array.
For n = 9, the median is the fifth observation in the data array.
Consider the following n = 6 data values:
11 12 15 17 21 32
What is the median?
M = (x
3
+x
4
)/2 = (15+17)/2 = 16
11 12 15 16 17 21 32
For even n, Median =
/ 2 ( / 2 1)
2
n n
x x
+
+
n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4
Central Tendency
Median
Consider the following n = 7 data values:
12 23 23 25 27 34 41
What is the median?
M = x
4
= 25
12 23 23 25 27 34 41
For odd n, Median =
( 1) / 2 n
x
+
(n+1)/2 = (7+1)/2 = 8/2 = 4
Central Tendency
Median
Use Excels function =MEDIAN(Data) where
Data is an array of data values.
For the 37 vehicle quality ratings (odd n) the
position of the median is
(n+1)/2 = (37+1)/2 = 19.

So, the median is x
19
= 121.
When there are several duplicate data values,
the median does not provide a clean 50-50
split in the data.
Central Tendency
Median
The median is insensitive to extreme data values.
For example, consider the following quiz scores for 3
students:
Toms scores:
20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285
Jakes scores:
60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380
Marys scores:
50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350
What does the median for each student tell you?
Central Tendency
Characteristics of the Median
The most frequently occurring data value.
Similar to mean and median if data values
occur often near the center of sorted data.
May have multiple modes or no mode.
Central Tendency
Mode
Lees scores:
60, 70, 70, 70, 80 Mean =70, Median = 70, Mode = 70
Pats scores:
45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45
Sams scores:
50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = none
Xiaos scores:
50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50,90
Central Tendency
Mode
For example, consider the following quiz scores for 3
students:
What does the mode for each student tell you?
Easy to define, not easy to calculate in large
samples.
Use Excels function =MODE(Array)
- will return #N/A if there is no mode.
- will return first mode found if multimodal.
May be far from the middle of the distribution
and not at all typical.
Central Tendency
Mode
Generally isnt useful for continuous data since
data values rarely repeat.
Best for attribute data or a discrete variable with
a small range (e.g., Likert scale).
Central Tendency
Mode
Consider the following P/E ratios for a random sample of
68 Standard & Poors 500 stocks.
What is the mode?
Central Tendency
Example: Price/Earnings Ratios and Mode
7 8 8 10 10 10 10 12 13 13 13 13 13 13 13 14 14
14 15 15 15 15 15 16 16 16 17 18 18 18 18 19 19 19
19 19 20 20 20 21 21 21 22 22 23 23 23 24 25 26 26
26 26 27 29 29 30 31 34 36 37 40 41 45 48 55 68 91
Excels descriptive statistics
results are:
The mode 13 occurs 7
times, but what does
the dot plot show?
Mean 22.7206
Median 19
Mode 13
Range 84
Minimum 7
Maximum 91
Sum 1545
Count 68
Central Tendency
Example: Price/Earnings Ratios and Mode
The dot plot shows local modes (a peak with
valleys on either side) at 10, 13, 15, 19, 23, 26, 29.
These multiple modes suggest that the mode is not a
stable measure of central tendency.
Central Tendency
Example: Price/Earnings Ratios and Mode
Points scored by the winning NCAA football team tends
to have modes in multiples of 7 because each
touchdown yields 7 points.
Central Tendency
Example: Rose Bowl Winners Points
Consider the dot plot of the points scored by the winning
team in the first 87 Rose Bowl games.
What is the mode?
A bimodal distribution refers to the shape of the
histogram rather than the mode of the raw data.
Occurs when dissimilar populations are combined in one
sample. For example,
Central Tendency
Mode
Compare mean and median or look at
histogram to determine degree of
skewness.
Central Tendency
Skewness
Distributions
Shape
Histogram Appearance Statistics
Skewed left
(negative
skewness)
Long tail of histogram points left
(a few low values but most data on
right)
Mean < Median
Central Tendency
Symptoms of Skewness
Symmetric
Tails of histogram are balanced
(low/high values offset)
Mean ~ Median
Skewed right
(positive
skewness)
Long tail of histogram points right
(most data on left but a few high
values)
Mean > Median
For the sample of J.D. Power quality ratings, the
mean (125.38) exceeds the median (121). What
does this suggest?
Central Tendency
Skewness
The geometric mean (G) is a
multiplicative average.
For the J. D. Power quality data (n=37):
1 2
...
n
n
G x x x =
37 77
37
(87)(93)(98)...(164)(173) 2.37667 10 123.38 G = = =
In Excel use =GEOMEAN(Array)
The geometric mean tends to mitigate the
effects of high outliers.
Central Tendency
Geometric Mean
A variation on the geometric mean used to find
the average growth rate for a time series.
For example, from
1998 to 2002, Spirit
Airlines revenues
are:
1
1
n
n
x
G
x
=
Year
Revenue
(mil)
1998 131
1999 227
2000 311
2001 354
2002 403
Central Tendency
Growth Rates
sc

The average growth rate is given by taking the geometric
mean of the ratios of each years revenue to the
preceding year.
Due to cancellations, only the first and last years are
relevant:
= 1.2421 = .242 or 24.2% per year
In Excel use =(403/131)^(1/5)-1
Central Tendency
Growth Rates
227
G =
311
131
| |
|
\ .
227
354
| |
|
\ .
311
403
354
| |
|
\ .
5
5
403
1 1
131
| |
=
|
\ .
The midrange is the point halfway between the lowest
and highest values of X.
Easy to use but sensitive to extreme data values.
min max
2
x x +
Midrange =
For the J. D. Power quality data (n=37):
min max
2
x x +
Midrange =
1 37
87 173
130
2 2
x x + +
= =
=
Here, the midrange (130) is higher than the mean
(125.38) or median (121).
Central Tendency
Midrange
To calculate the trimmed mean, first remove the highest
and lowest k percent of the observations.
For example, for the n = 68 P/E ratios, we want a 5
percent trimmed mean (i.e., k = .05).
To determine how many observations to trim, multiply k x
n = 0.05 x 68 = 3.4 or 3 observations.
So, we would remove the three smallest and three
largest observations before averaging the remaining
values.
Central Tendency
Trimmed Mean
Here is a summary of all the measures of central
tendency for the n = 68 P/E values.
The trimmed mean mitigates the effects of very high
values, but still exceeds the median.
Mean: 22.72 =AVERAGE(PERatio)
Median: 19.00 =MEDIAN(PERatio)
Mode: 13.00 =MODE(PERatio)
Geometric Mean: 19.85 =GEOMEAN(PERatio)
Midrange: 49.00 =(MIN(PERatio)+MAX(PERatio))/2
5% Trim Mean: 21.10 =TRIMMEAN(PERatio,0.1)
Central Tendency
Trimmed Mean
Central Tendency
Trimmed Mean
The Federal
Reserve uses a
16% trimmed
mean to mitigate
the effects of
extremes in its
analysis of the
Consumer Price
Index.
Variation is the spread of data points about the
center of the distribution in a sample. Consider
the following measures of dispersion:
Statistic Formula Excel Pro Con
Range x
max
x
min

=MAX(Data)-
MIN(Data)
Easy to calculate
Sensitive to
extreme data
values.
Dispersion
Variance
(s
2
)
=VAR(Data)
Plays a key role
in mathematical
statistics.
Non-intuitive
meaning.
( )
2
1
1
n
i
i
x x
n
=

Measures of Variation
Statistic Formula Excel Pro Con
Standard
deviation
(s)
=STDEV(Data)
Most common
measure. Uses
same units as the
raw data ($ , , ,
etc.).
Non-intuitive
meaning.
( )
2
1
1
n
i
i
x x
n
=

Dispersion
Measures of Variation
Coef-
ficient. of
variation
(CV)
None
Measures relative
variation in
percent so can
compare data
sets.
Requires
non-
negative
data.
100
s
x

Statistic Formula Excel Pro Con


Mean
absolute
deviation
(MAD)
=AVEDEV(Data)
Easy to
understand.
Lacks nice
theoretical
properties.
Dispersion
Measures of Variation
1
n
i
i
x x
n
=

The difference between the largest and smallest


observation.
Range = x
max
x
min

For example, for the n = 68 P/E ratios,
Range = 91 7 = 84
Dispersion
Range
The population variance (o
2
) is
defined as the sum of squared
deviations around the mean
divided by the population size.
For the sample variance (s
2
), we
divide by n 1 instead of n,
otherwise s
2
would tend to
underestimate the unknown
population variance o
2
.
( )
2
2
1
N
i
i
x
N
=

o =

( )
2
2
1
1
n
i
i
x x
s
n
=

=

Dispersion
Variance
The square root of the variance.
Units of measure are the same as X.
Population
standard
deviation
( )
2
1
N
i
i
x
N
=

o =

Sample
standard
deviation
( )
2
1
1
n
i
i
x x
s
n
=

=

Explains how individual values in a data set vary


from the mean.
Dispersion
Standard Deviation
Excels built in functions are
Statistic Excel population
formula
Excel sample
formula
Variance =VARP(Array) =VAR(Array)
Standard deviation =STDEVP(Array) =STDEV(Array)
Dispersion
Standard Deviation
Consider the following five quiz scores for
Stephanie.
Dispersion
Calculating a Standard Deviation
Now, calculate the sample standard deviation:
( )
2
1
2380
595 24.39
1 5 1
n
i
i
x x
s
n
=

= = = =

Somewhat easier, the two-sum formula can also be


used:
2
2
1 2
2
1
(360)
28300
28300 25920
5
595 24.39
1 5 1 5 1
n
i
n
i
i
i
x
x
n
s
n
=
=
| |
|
\ .



= = = = =

Dispersion
Calculating a Standard Deviation
The standard deviation is nonnegative because
deviations around the mean are squared.
When every observation is exactly equal to the
mean, the standard deviation is zero.
Standard deviations can be large or small,
depending on the units of measure.
Compare standard deviations only for data sets
measured in the same units and only if the
means do not differ substantially.
Dispersion
Calculating a Standard Deviation
Useful for comparing variables measured in
different units or with different means.
A unit-free measure of dispersion
Expressed as a percent of the mean.
Only appropriate for nonnegative data. It is
undefined if the mean is zero or negative.
100
s
CV
x
=
Dispersion
Coefficient of Variation
For example:
Defect rates
(n = 37)
s = 22.89
= 125.38

gives

CV = 100 (22.89)/(125.38) = 18%
ATM deposits
(n = 100)
s = 280.80
= 233.89

gives

CV = 100 (280.80)/(233.89) = 120%
P/E ratios
(n = 68)
s = 14.28
= 22.72

gives

CV = 100 (14.08)/(22.72) = 62%
x
x
x
100
s
CV
x
=
Dispersion
Coefficient of Variation
The Mean Absolute Deviation (MAD) reveals the
average distance from an individual data point to
the mean (center of the distribution).
Uses absolute values of the deviations around
the mean.
Excels function is =AVEDEV(Array)
1
n
i
i
x x
MAD
n
=

=

Dispersion
Mean Absolute Deviation
Consider the histograms of hole diameters drilled
in a steel plate during manufacturing.
The desired distribution is outlined in red.
Dispersion
Machine A Machine B
Central Tendency vs. Dispersion:
Manufacturing
Desired mean (5mm)
but too much variation.
Acceptable variation but
mean is less than 5 mm.
Take frequent samples to monitor quality.
Machine A Machine B
Dispersion
Central Tendency vs. Dispersion:
Manufacturing
Consider student ratings of four professors on
eight teaching attributes (10-point scale).
Dispersion
Central Tendency vs. Dispersion:
J ob Performance
Jones and Wu have identical means but different
standard deviations.
Dispersion
Central Tendency vs. Dispersion:
J ob Performance
Smith and Gopal have different means but identical
standard deviations.
Dispersion
Central Tendency vs. Dispersion:
J ob Performance
A high mean (better rating) and low standard
deviation (more consistency) is preferred. Which
professor do you think is best?
Dispersion
Central Tendency vs. Dispersion:
J ob Performance
Selamat Belajar

Você também pode gostar