Você está na página 1de 38

+

Assignments
Walkthroughs and R
Demo
W4290 Statistical Methods in Finance | Spring 2010 | Columbia University
+ 2

Outline

„ Assignment 1

„ Assignment 2

„ Principal Component Analysis (PCA) on R


+ 3

Assignment 1
+ 4

Assignment 1
„ Problem 3.5

(a) Compute the sample mean μ̂ and the sample


covariance matrix §ˆ of the log returns.

To change the working directory:


#R code:
setwd(<working directory>)
# e.g. “C:/Documents and Settings/tsit/”

To input the data:

returns<-read.table("m_logret_10stocks.txt",
header=TRUE)[1:156,2:11]
+ 5

Assignment 1
„ Problem 3.5

(a) Compute the sample mean μ̂ and the sample


covariance matrix §ˆ of the log returns.

Use mean and var to obatin the estimates:


#R code:
mean(returns)
var(returns) # or cov(returns)

In MATLAB, however, for a matrix A, var(A) and cov(A)


compute different things. Compare
%MATLAB code:
A = [1,2;3,4;5,6;7,8]
var(A) % compute only entries in the same column
cov(A) % compute all entries in the matrix/vector
+ 6

Assignment 1
„ Problem 3.5

(b) Assume the monthly target return is 0.3% and that


short selling is allowed. Estimate the optimal portfolio
weights by replacing ( μ , § ) in Markowitz’s theory by
( μ̂ , §ˆ ).

#R code:
%MATLAB code:
no.of.stocks<-dim(returns)[2]
size (M, i)
# For an m x n matrix M, dim(M) gives % i: ith dim
# >> [1] m n
# Using “dim” can help you extract the size of the
# matrix.

To facilitate the computation, we’ll write a function.


+ 7

Assignment 1| Functions
„ One of the great strengths of R is the user's ability to add
functions. In fact, many of the functions in R are actually
functions of functions. The structure of a function is
given below.
myfunction <- function(arg1, arg2, ... ){
statements
return(object)
}

„ We can therefore write a function, say Markowitz, to


allow us to compute the efficient weights automatically
once we input the returns dataset and the target return
(arguments).

„ Exact solution can be found on P. 70-71 of Lai & Xing


(2008) (Eq. (3.9)-(3.15))
+ 8

Assignment 1| Markowitz
#R code:
no.of.stocks<-dim(returns)[2]
Markowitz<-function(return.set, tar.re){ %MATLAB code:
% Compare “*”
mu<-mean(return.set)
Sigma<-var(return.set) and “.*”
inv.Sigma<-solve(Sigma) # solve % inv(Sigma)
A<-sum(mu%*%inv.Sigma) # %*%
B<-mu%*%inv.Sigma%*%mu
C<-sum(inv.Sigma)
D<-B*C - A*A
vol.eff<-sqrt((C*tar.re^2 - 2*tar.re*A + B)/D)
weights<-matrix(NA, length(tar.re), no.of.stocks)
for (i in 1:length(tar.re))
weights[i,]<-(as.numeric(B/D)*inv.Sigma%*%rep(1,10)
-as.numeric(A/D)*inv.Sigma%*%mu
+tar.re[i]*(as.numeric(C/D)*inv.Sigma%*%mu
-as.numeric(A/D)*inv.Sigma%*%rep(1,10)))
list(vol.eff=vol.eff,weights=weights) #list
}
+ 9

Assignment 1| Markowitz
#R code:
target.return<-log(1+0.003)
eff.frontier<-Markowitz(returns, target.return)
eff.frontier$weights

mean.sigma<-sqrt(
diag(eff.frontier$weights%*%var(returns)
%*%t(eff.frontier$weights))
)
# ------------------------------------------------------------
# >eff.frontier$weights
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0.07135497 -0.02974003 0.6499748 -0.02362635 -0.1913992 0.07879041
# [,7] [,8] [,9] [,10]
# [1,] 0.1617200 0.0964938 0.1630675 0.02336413

„ The function diag helps us extract the diagonal elements


from a matrix. (It is faster than using a double for-loop.)
+ 10

Assignment 1| Bootstrap
„ Problem 3.5

(c) Do the same as in (b) for Michaud’s resampled


weights (3.38) using B=500 bootstrap samples.
−1
PB ∗
ω̄ = B ω̂
b=1 b

Bootstrap, Algorithm:
1. Set up an index vector: index={1,2,…,no.of.obs (156)}
2. From the index set generated, we randomly draw m
samples (with replacement)
3. Suppose {2, 5, 1, 156, 2, 4, … 20} is drawn, we copy
different days of returns of ALL 10 stocks from the
original data set according to the sequence of indices
drawn. In this case, Day 2, 5, 1… of the original data
set will become the first, second, third, … row of our
new, resampled dataset.
+ 11

Assignment 1| Bootstrap
„ To implement it in R…
#R code:
n.sims <- 500
index <- seq(1,dim(returns)[1])
draws<-matrix(sample(index,size=n.sims*length(index)
,replace=TRUE),n.sims)
weights.Michaud<-matrix(NA, n.sims, no.of.stocks)
for (b in 1:n.sims){
portfolio<-returns[draws[b,],]
weights.Michaud[b,]<-t(Markowitz(portfolio,
target.return)$weights)}
apply(weights.Michaud,2,mean)
}

If you prefer to use MATLAB, you may try…


%MATLAB code:
index=randsample(1:no.of.obs,m,true)
%...
+ 12

Assignment 1| Plots
„ Problem 3.5

(d) Plot the estimated efficient frontier (by varying μ∗ over


a grid) that uses ( μ̂ , §ˆ ) to replace ( μ , § ) in Markowitz’s
efficient frontier.

We need to generate a sequence of target returns, plug


each of them in Markowitz and get one set of efficient
weights…
#R code:
mus <- seq(-0.01,0.015,length.out=100)
vol<-Markowitz(returns, mus)$vol.eff
plot(vol,mus,type="l", xlab= "Volatility", ylab="Return")

%MATLAB code:
mus=linspace(-0.01,0.015,100)
%...
plot(mus,vol)
hold on
+ 13

Assignment 1| Plots
„ Problem 3.5

(e) Plot Michaud’s resampled efficient frontier using


B = 500 bootstrap sample. Compare with the plot in (d).
#R code:
index<-seq(1, dim(returns)[1])
draws<-
matrix(sample(index,size=n.sims*length(index),replace=TRUE),n.sim
s)
weights.Bootstrap<-matrix(0, length(mus), dim(returns)[2])
for (b in 1:n.sims){
resampled.returns<-returns[draws[b,],]
weights.Bootstrap[,]<-weights.Bootstrap[,]
+Markowitz(resampled.returns, mus)$weights
}
weights.Bootstrap[,]<-weights.Bootstrap[,]/n.sims
mean.mu <- t(weights.Bootstrap[,]%*%mean(returns))
mean.sigma<-sqrt(diag(weights.Bootstrap[,]%*%
var(returns)%*%t(weights.Bootstrap[,])))
points(mean.sigma,mean.mu,type="l",col="Red")
+ 14

Assignment 1| Plots

Efficient frontiers obtained from different numbers of resampling:


Red=10, Blue=500 (Left); Plots of efficient frontiers obtained
from each resampled dataset (Right).
+ 15

Assignment 2
+ 16

Assignment 2
„ Problem 3.6

(a) Fit CAPM to the ten stocks. Give point estimates and
95% confidence intervals of α,β, the Sharpe index, and
the Treynor index.

We have to transform the interest rate from annual return


into monthly return so that it is compatible with the
monthly return of the stocks:

(1 + rmonthly )12 = 1 + rannual


#R code:
marketreturns[,2] <- (1+marketreturns[,2] / 100)^(1/12)-1
marketexcess <- marketreturns[,1]- marketreturns[,2]
stockexcess <- #similarly…

n.of.stocks <- #use dim(stockexcess)[?]


n.of.obs <- #...
+ 17

Assignment 2
„ The parameters in CAPM can be obtained by running
linear regressions lm(y~x) of the excess equity returns
on the excess market returns for each of the ten stocks
in the data set.

#R code:
for(i in 1:n.of.stocks){
llm <- summary(lm(stockexcess[,i] ~ marketexcess))
alpha.coef[i] <-
# try to use names(llm) and use llm$<…> to extract useful info.
alpha.sd[i] <- #...
beta.coef[i] <- #...
beta.sd[i] <- #...
rres[i,] <- #...
sharpe.coef[i] <- mean(stockexcess[,i])/sd(stockexcess[,i])
treynor.coef[i] <- mean(stockexcess[,i])/beta.coef[i]
#...

#names(llm)
+ 18

Assignment 2 | Delta Method


„ Because it is not easy to calculate

Var( μσ ) and Var( μβ ) ,

we use Delta Method to approximate these values.

„ What is Delta Method? According to Lai & Xing (2008), we


have on Page 59:

g(θ̂) − g(θ 0 ) −→D N (0, (∇g(θ̂))0 (−∇2 ln (θ̂))−1 (∇g(θ̂)))

… =_=’’
+ 19

Assignment 2 | Delta Method


„ In fact, you can view it as a way to obtain the variance
through Taylor series expansion.

Recall, if θ̂n ≈ θ0 , for any “smooth” function Φ(‧)

φ(θ̂n ) − φ(θ0 ) ≈ φ0 (θ0 )(θ̂n − θ0 ) + 12 φ00 (θ0 )(θ̂n − θ0 )2 + . . .

If we take variance on both sides and treat φ(n) (θ0 ) ≈ φ(n) (θ̂)
then,

Var(φ(θ̂)) ≈ [φ0 (θ̂)]2 Var(θ̂n − θ0 )


+ 20

Assignment 2 | Delta Method


If we let φ(μ, σ) = σ , and again use Taylor series
μ
„
expansion to expand the function, we will have

φ(μ̂, σ̂) ≈ φ(μ, σ) + φ01 (μ, σ)(μ̂ − μ) + φ02 (μ, σ)(σ̂ − σ) + . . .


where
∂ ∂
φ01 := ∂x φ(x, y) = y −1 φ02 := ∂y φ(x, y) = −xy −2 .

V ar(Ŝ) ≈ V ar {φ(μ, σ) + φ01 (μ, σ)(μ̂ − μ) + φ02 (μ, σ)(σ̂ − σ) + . . .}


2 2
= [φ01 (μ, σ)] V ar(μ̂) + [φ02 (μ, σ)] V ar(σ̂)
+2 [φ01 (μ, σ)φ02 (μ, σ)] Cov(μ̂, σ̂)
indep. 2 2
≈ [φ01 (μ̂, σ̂)] V ar(μ̂) + [φ02 (μ̂, σ̂)] V ar(σ̂)
= ...
+ 21

Assignment 2
„ The variance of the Treynor ratio can be computed in a
similar fashion. Notice that for this ratio, the covariance
term also vanishes1.

„ (b) Use the bootstrap procedure in Section 3.5 to


estimate the standard errors of the point estimates of
α,β,the Sharpe and Treynor indices.

#R code:
# This part will be very similar to part (c) in Problem 3.5. Make
# sure that you have set up matrices to store the estimated
# values. It is a combination of the bootstrap technique that you
# just learnt and the code for part (a) of Problem 3.6

1
The covariance between the mean of excess stock return and the beta estimate should also be 0.
+ 22

Assignment 2
„ Problem 3.6

(c) Test for each stock the null hypothesis α= 0.


(Skipped)

(d) Use the regression model (3.24) to test for the ten
stocks the null hypothesis α= 0.

This problem can be solved by using (3.26) and (3.28).

−1
Pn
V̂ = n i=1 (y i − α̂ − β̂xi )(y i − α̂ − β̂xi )0
and
n o−1
n−q−1 0 −1 Pn x̄
2
α̂ V̂ α̂ 1+ 2 ∼ Fq,n−q−1
q n1 t=1 (xt −x̄)

under H0.
+ 23

Assignment 2
„ The corresponding R code is shown as follows. Notice
that we can extract the residual from the regression
result –rres.

#R code:
alpha.level <- 0.05

#what is rres?
Vhat <- 1/n.of.obs * rres %*% t(rres) # Formula 3.26
nnumerator <- t(alpha.coef) %*% solve(Vhat) %*% alpha.coef
ddenominator <- 1 + mean(marketexcess)^2
/(var(marketexcess)*(n.of.obs-1)/n.of.obs)
dof <- n.of.obs-n.of.stocks-1
sstat <- dof/n.of.stocks * nnumerator / ddenominator
# Formula 3.28
fstat <- qf(1-alpha.level, n.of.stocks, dof)
print(sstat)
print(fstat)
+ 24

Assignment 2
„ Problem 3.6

(e) Perform a factor analysis on the excess returns of the


ten stocks. Show the factor loadings and rotated factor
loadings. Explain your choice of the number of factors.

Use factanal and varimax:

#R code:
ana2<-factanal(stockexcess, factors=2, rotation="none")
rot2<-varimax(loadings(ana2), normalize = TRUE)

#... Try factors=3,4… and discuss/argue.


+ 25

Assignment 2 | Testing
„ Problem 3.6

(f) Consider the model

rte = β1 1{t<t0 } rM
e e
+ β2 1{t≥t0 } rM + ²t ,
in which rte = rt − rf and rM e
= rM − rf are the excess
returns of the stock and the S&P500 index. The model suggests
that the βin the CAPM might not be a constant (i.e. β1 ≠ β2).
Taking February 2001 as the month t0, test for each stock the
number hypothesis that β1 = β2.

Hint:
Transform the model into

rte = (β1 − β2 )1{t<t0 } rM


e e
+ β2 rM + ²t ,

and test for β1 - β2.


+ 26

Assignment 2 | Change point


„ Problem 3.6

(g) Estimate t0 in (f) by least squares criterion that


minimises the residual sum of squares over the
parameters.

#R code:
changepoint <- function(i, t0)
{
x2 <- c(rep(0,t0-1), rep(1, n.of.obs-t0+1))
#set up the indicator function

llm <- summary(lm(stockexcess[,i] ~ marketexcess + ?))


pV <- llm$coef[3,4]
resSS <- #...
return(list(pV=pV, resSS=resSS))
}

apply(SSR,1,which.min) will be helpful for you to


locate which t0 gives you the smallest SSR for each stock.
+ 27

Principal Component Analysis (PCA)


+ 28

Principal Component Analysis


„ Geometric Interpretation
+ 29

Principal Component Analysis


„ Geometric Interpretation
+ 30

Principal Component Analysis


„ Principal Component Analysis (PCA) is a traditional
multivariate statistical technique for dimension or
variables reduction.

„ It is to find linear combinations of original variables such


that the information in the original data is preserved.

„ Before we go into the details of it, we first look at an


interesting application of PCA to identify the important
components of the term structure of interest rate.

„ The file “us-rate.dat” contains US semi-annualized


zero-coupon rates with maturities between 1M to 30Y,
monthly data from 1944 to 1992.
+ 31

Principal Component Analysis


„ Implementation in R:
#R code:
d<-read.table("us-rate.csv“, sep=“,”)
label<-
c("1m","3m","6m","9m","12m","18m","2y","3y","4y","5y","7y","10y",
"15y")
names(d)<-label
options(digits=2)
cor(d)

R printout:
cor(d) Æ
+ 32

Principal Component Analysis


„ First note that these 13 variables are highly correlated.
This implies that we can use only few variables (say 2 or
3) to represent this dataset without loss of large amount
of information.

„ A natural question to ask is which variables to use so that


they can contain the information in the dataset as much
as possible?

„ The answer is not from any of these original variables but


a linear combination of the original variables.

„ Refer to Prof Ying’s lecture and the textbook for the


derivation of the principal components.
+ 33

Principal Component Analysis


„ The reason for maximizing the variance of the
transformed variable y is that we want to preserve as
much information in x as possible. Conceptually,
variation represents information.

„ We want the variance of the transformed variable y as


large as possible, so that the original information in x is
preserved. The eigenvalue, which is equal to the variance
of y, indicates the amount of information being retained
in the PC.

PCA using R

„ Let us first perform PCA using R and the princomp()


function.
+ 34

Principal Component Analysis


#R code:
pca<-princomp(d,cor=T)
pca$loadings[,1:6]
pc1<-pca$loadings[,1]
pc2<-pca$loadings[,2] # save the loading of PC2

Å R printout:
pca$loadings[,1:4]
+ 35

Principal Component Analysis


„ We want to represent this us-rate dataset by using only
few variables instead of all 13 variables.

„ How many PC should we use? Before we answering this


question, first let use look at the variance or standard
deviation (s.d.) of these PC.
#R code:

s<-pca$sdev # save the s.d. of all PC to s


s # display s
+ 36

Principal Component Analysis


„ From the above output, we know that the s.d. and
variance of the 1st PC is 3.567 and 12.7239 respectively
and so on. The 1st PC explained 12.7239/13 = 97.88%
of the total variance. The 2nd PC explained 1.83%, the
3rd PC explained 0.19% and so on. If we only use the 1st
PC, this PC already preserved almost all (97.88%) the
information.

„ A plot of variance (called Scree plot) can help us


determine the suitable number of PC used. This plot
represents the information retained in each PC
graphically.

#R code:
screeplot(pca,type="lines")
+ 37

Principal Component Analysis

Scree Plot.
+ 38

Principal Component Analysis


„ Interpretation of PCs

1st PC represents a parallel shift of the


yield curve.

2nd PC agrees with the liquidity


preference theory. This theory states
that investors prefer to preserve their
liquidity and invest funds for short
period of time. This component is
termed as the tilt component of the yield
curve.

3rd PC can be interpreted as the


curvature of the yield curve.

Você também pode gostar