Você está na página 1de 26

Investment Tools

Statistics
SASF CFA Quant. Review
2
Statistical Concepts
Population is defined as all members of a specified group.
Sample is a subset of a defined population.
Frequency Distribution: is a tabular display of data summarized into a
relatively small number of intervals.
Frequency distribution is the list of intervals together with the
corresponding measures of frequency for the variable of interest.
A histogram - graphical equivalent of a frequency distribution; it
is a bar chart where continuous data on a random variables
observations have been grouped into intervals.
A frequency polygon is the line graph equivalent of a frequency
distribution; it is a line graph that joins the frequency for each
interval, plotted at the midpoint of that interval.
3
Frequency Distribution Table
Raw Data:
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Class Frequency
15 but < 25 3
25 but < 35 5
35 but < 45 2
4
Frequency Distn Table Steps
1. Determine Range
2. Select Number of Classes
Usually Between 5 & 15 Inclusive
3. Compute Class Intervals (Width)
4. Determine Class Boundaries (Limits)
5. Compute Class Midpoints
6. Count Observations & Assign to Classes
5
0
1
2
3
4
5
Histogram
Frequency
Relative
Frequency
Percent
0 15 25 35 45 55
Lower Boundary
Bars
Touch
Class Freq.
15 but < 25 3
25 but < 35 5
35 but < 45 2
Count
6
0
1
2
3
4
5
Frequency Polygon
Midpoint
Fictitious
Class
0 10 20 30 40 50 60
Class Freq.
15 but < 25 3
25 but < 35 5
35 but < 45 2
Frequency
Relative
Frequency
Percent
Count
7
Numerical Data Properties
Central Tendency
(Location)
Variation
(Dispersion)
Shape
8
Measures of Central Tendency summarize the location on which the data are
centered.
Population Mean: calculated as
where there are N members in the population and each observation is X
i
i =1, 2,
N.
Sample Mean: calculated as
where there are n observations in the sample and each observation is X
i
i =1, 2,
n. It is also the arithmetic mean of the sample observations.
Median: calculated as the middle observation in a group that has been ordered
in either ascending or descending order.
In an odd-numbered group this is the (n+1)/2 position.
In an even numbered group it is the average of the values in the n/2 and (n+1)/2
positions.
Mode: is the most frequently occurring value in the distribution. A distribution
may have one, more than one, or no mode.
Measures of Central Tendency

=
=
n
i
i
X
n
X
1
1

=
=
N
i
i X
X
N
1
1

9
Other Definitions for Means
Measures of central tendency summarize the location on which the
data are centered.
Weighted Mean: calculated as
where there are n observations, each observation is X
i
, and the weight
associated with each observation is w
i
i =1, 2, n. If w
i
= 1/n, then this is
the sample mean. If w
i
is the probability of X
i
occurring then this weighted
mean is the expected value of the random variable X.
Geometric Mean: calculated as
where there are n observations and each observation is X
i
.
n
n
X X X G =
2 1

=
=
n
i
i i Weighted
X w X
1
10
Measures of Dispersion
Range: is the difference between the maximum and minimum values in
a dataset.
Mean Absolute Deviation: is the average of the datas absolute
deviations from the mean.

Population Variance: is the average of the populations squared
deviations from the mean.

The population standard deviation is simply the square root of the
population variance.
Sample Variance: is the average of the sample datas squared
deviations from the sample mean.

The sample standard deviation is simply the square root of the sample
variance.

=
=
n
i
i
X X
n
MAD
1
1
( )

=
=
N
i
i
X
N
1
2
2
1
o
( )
( )

=
n
i
i
X X
n
s
1
2
2
1
1
11
Useful Measures for Returns
Holding Period Return: is expressed in percent terms, i.e.
independent of currency units, and is calculated over a period of time.



Holding Period Return = R
t
Share Price end of time t = P
t

Share Price end of time t-1 = P
t-1

Cash Distributions during period t = D
t

Holding Period Return, R
t
, consists of capital gains over the period plus
distributions during the period divided by the beginning price (distribution
yield).

( )
1
1

+
=
t
t t t
t
P
D P P
R
12
Coefficient of variation,
CV shows relative dispersion. If X is returns on an asset then CV shows
the amount of risk (measured by sample standard deviation s) for every
% of mean return on the asset. The lower an assets CV, the more
attractive it is in risk per unit of return.

Sharpe measure,
SM is a more precise return-risk measure as it takes into account an
investor can earn the risk-free rate, r
p
, without bearing any risk. Hence a
portfolios risk (measured by its standard deviation s
p
) must be
compared to its return in excess of the risk-free rate . The higher is SM,
the better the return-risk tradeoff on the portfolio for an investor
X
s
CV =
( )
p
f p
r r
SM
o

=
Measures of Risk vs. Return
13
Shape
1. Describes How Data Are Distributed
2. Measures of Shape
Kurtosis = How Peaked or Flat
Skew = Symmetry
Positive-Skewed Negative-Skewed Symmetric
Mean = Median = Mode Mean Median Mode Mode Median Mean
14
Measures of Shape
Frequency distribution that is not symmetric is skewed.
Positively-skewed distribution is characterized by many small losses but a few
extremely large gains. It has a long tail on the right side of the distribution.
Negatively-skewed distribution is characterized by many small gains but a few
extremely large losses. It has a long tail on the left-hand side of the distribution.
Skewness arises as a result of the properties of asset prices and returns. A
share price can never be negative there is a lower limit on the assets
returns (-100%) but no theoretical limit on its upper limit so an assets
return may be positively-skewed.
i. Symmetrical distribution: Mean = Median = Mode
ii. Positively-skewed distribution: Mean > Median > Mode
iii. Negatively-skewed distribution: Mean < Median < Mode
15
Measures of Shape
A frequency distribution that is more or less peaked than a
Normal distribution is said to exhibit kurtosis. If the
distribution is more peaked than a Normal (i.e. exhibits fat
tails) it is leptokurtic. If it is less peaked than a Normal it is
called platykurtic.
Positive excess kurtosis, i.e. a leptokurtic distribution, means that large
positive and negative deviations from the mean have higher
probabilities for occurring than they would under a Normal
distribution.
If an portfolios returns are leptokurtic then its true risk is higher than
the risk suggested by an analysis that assumes returns are Normally
distributed. This is important for Value at Risk (VAR) calculations that
must assume distributions for asset returns in a portfolio.
16
Frequencies
19. An analyst gathered the
following data:
63.5 96.9 112.3 134.1
66.4 98.3 116.2 138.5
75.6 99.5 116.9 139.8
77.5 100.7 118.3 140.7
84.4 102.0 122.0 143.0
87.6 105.5 122.2 153.9
89.9 108.4 124.5 155.5

Five classes as follows:
1. 60 < x < 80.
2. 80 < x < 100
3. 100 < x < 120
4. 120 < x < 140
5. 140 < x < 160
In constructing a frequency
distribution using five classes, if the
first class is "60 up to 80," the class
frequency of the third class is:
A. 4.
B. 5.
C. 6.
D. 8.



Hence there are 8 observations in
the third class.
Note the misleading way the
question is asked! Always read
the question carefully!!!!!
17
Geometric Mean
21. A portfolio of non-dividend-
paying stocks earned a geometric
mean return of 5 percent between
January 1, 1995, and December
31, 2001. The arithmetic mean
return for the same period was 6
percent. If the market value of the
portfolio at the beginning of 1995
was $100,000, the market value of
the portfolio at the end of 2001
was closest to:
A. $135,000.
B. $140,710.
C. $142,000.
D. $150,363.
Identify what you are being asked for
Portfolio Ending value P
12/31/2001

Given the following:
Portfolio Beginning value = P
1/1/1995

=$100,000
Geometric mean return = 5%
Arithmetic mean return = 6%
Number of periods = 7
Non-dividend paying stocks in portfolio.
Identify correct approach use
geometric mean return and formula
P
t+7
= (1+r)
7
P
t

P
t+7
= (1.05)
7
$100,000 = $140,710

19
Other Questions
23. Which of the following statements
about standard deviation is TRUE?
Standard deviation:
A. is the square of the variance.
B. can be a positive or a negative
number.
C. is denominated in the same
units as the original data.
D. is the arithmetic mean of the
squared deviations from the mean.
25. A stock with a coefficient of variation
of 0.50 has a(n):
A. variance equal to half the stock's
expected return.
B. expected return equal to half the
stock's variance.
C. expected return equal to half the
stock's standard deviation.
D. standard deviation equal to half
the stock's expected return.

If

then

2
1
= =
X
s
CV
X s
2
1
=
Simple Linear Regression
21
Y
Y = mX + b
b = Y-intercept
X
Change
in Y
Change in X
m = Slope
Linear Equations & Regression
1. Answer to What Is the Relationship Between the Variables?
2. Regression Equation Used
1 Numerical Dependent (Response) Variable
Variable to be Predicted
1 or More Numerical or Categorical Independent (Explanatory) Variables
3. Used to Test Theories and for Prediction
22
Y X
i i i
= + + | | c
0 1
Linear Regression Model
Relationship Between Variables Is a Linear
Function
Dependent
(Response)
Variable
Independent
(Explanatory)
Variable
Population
Slope
Population
Y-Intercept
Random
Error
23
Probabilistic Models
Hypothesize 2 Components involved in explaining
behavior of a variable of interest.
Deterministic based on relevant theory
Random Error reflects unknown elements
Example: Want to explain the return on a companys
stock.
Theory: Return on Company j is 1.50 Times Return on
Overall Stock Market Plus Random Error
Probabilistic Model:
R
j
= 1.5 R
Mkt
+ c
j

Random Error May Be Due to Company-specific Factors.
24
0
20
40
60
0 10 20 30
X
Y
Scatter Diagram
1. Plot of All (X
i
, Y
i
) Pairs
2. Suggests How Well Model Will Fit
How Would You Draw a Line Through the Points?
How Do You Determine Which Line Fits Best?
25
1. Best Fit Means Difference Between Actual Values
(Y
i
) & Predicted Values ( ) Are a Minimum
But Positive Differences Off-Set Negative so use squared
errors to determine closest fitting line.


2. Least Squares Regression (LS) Minimizes the Sum
of the Squared Differences (or Errors).
Ordinary Least Squares, LS
Y

i
Y Y
e
i i
i
n
i
i
n
=
= =


e j
2
1
2
1
26
e
2
Y
X
e
1
e
3
e
4
Least Squares (LS) Graphically
Y b b X e
i i i
= + +
0 1

Y b b X
i i
= +
0 1
LS Minimizes e e e e e
i
i
n
2
1
1
2
2
2
3
2
4
2
= + + +
=

27
Interpretation of LS Coefficients
1. Slope (b
1
)
Estimated Y changes by b
1
for each 1unit change
in X
If b
1
= 2, then Company Return (Y) is expected to
increase by 2 for each 1 unit increase in Markets
Return (X)
2. Y-Intercept (b
0
)
Average Value of Y when X = 0
If b
0
= 2, then Average Company Return (Y) Is
Expected to Be 2 When Market Return (X) Is 0

Você também pode gostar