Assignments Walkthroughs and R Demo: W4290 Statistical Methods in Finance - Spring 2010 - Columbia University

+
Assignments
Walkthroughs and R
Demo
W4290 Statistical Methods in Finance | Spring 2010 | Columbia University
+ 2
Outline
Assignment 1
Assignment 2
Principal Component Analysis (PCA) on R

+ 3
Assignment 1
+ 4
Assignment 1
Problem 3.5
(a) Compute the sample mean μ̂ and the sample

covariance matrix §ˆ of the log returns.
To change the working directory:

#R code:
setwd(<working directory>)
# e.g. “C:/Documents and Settings/tsit/”
To input the data:
returns<-read.table("m_logret_10stocks.txt",
header=TRUE)[1:156,2:11]
+ 5
Assignment 1
Problem 3.5
(a) Compute the sample mean μ̂ and the sample

covariance matrix §ˆ of the log returns.
Use mean and var to obatin the estimates:

#R code:
mean(returns)
var(returns) # or cov(returns)
In MATLAB, however, for a matrix A, var(A) and cov(A)

compute different things. Compare
%MATLAB code:
A = [1,2;3,4;5,6;7,8]
var(A) % compute only entries in the same column
cov(A) % compute all entries in the matrix/vector
+ 6
Assignment 1
Problem 3.5
(b) Assume the monthly target return is 0.3% and that

short selling is allowed. Estimate the optimal portfolio
weights by replacing ( μ , § ) in Markowitz’s theory by
( μ̂ , §ˆ ).
#R code:
%MATLAB code:
no.of.stocks<-dim(returns)[2]
size (M, i)
# For an m x n matrix M, dim(M) gives % i: ith dim
# >> [1] m n
# Using “dim” can help you extract the size of the
# matrix.
To facilitate the computation, we’ll write a function.

+ 7
Assignment 1| Functions
One of the great strengths of R is the user's ability to add
functions. In fact, many of the functions in R are actually
functions of functions. The structure of a function is
given below.
myfunction <- function(arg1, arg2, ... ){
statements
return(object)
}
We can therefore write a function, say Markowitz, to

allow us to compute the efficient weights automatically
once we input the returns dataset and the target return
(arguments).
Exact solution can be found on P. 70-71 of Lai & Xing

(2008) (Eq. (3.9)-(3.15))
+ 8
Assignment 1| Markowitz
#R code:
no.of.stocks<-dim(returns)[2]
Markowitz<-function(return.set, tar.re){ %MATLAB code:
% Compare “*”
mu<-mean(return.set)
Sigma<-var(return.set) and “.*”
inv.Sigma<-solve(Sigma) # solve % inv(Sigma)
A<-sum(mu%*%inv.Sigma) # %*%
B<-mu%*%inv.Sigma%*%mu
C<-sum(inv.Sigma)
D<-B*C - A*A
vol.eff<-sqrt((C*tar.re^2 - 2*tar.re*A + B)/D)
weights<-matrix(NA, length(tar.re), no.of.stocks)
for (i in 1:length(tar.re))
weights[i,]<-(as.numeric(B/D)*inv.Sigma%*%rep(1,10)
-as.numeric(A/D)*inv.Sigma%*%mu
+tar.re[i]*(as.numeric(C/D)*inv.Sigma%*%mu
-as.numeric(A/D)*inv.Sigma%*%rep(1,10)))
list(vol.eff=vol.eff,weights=weights) #list
}
+ 9
Assignment 1| Markowitz
#R code:
target.return<-log(1+0.003)
eff.frontier<-Markowitz(returns, target.return)
eff.frontier$weights
mean.sigma<-sqrt(
diag(eff.frontier$weights%*%var(returns)
%*%t(eff.frontier$weights))
)
# ------------------------------------------------------------
# >eff.frontier$weights
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0.07135497 -0.02974003 0.6499748 -0.02362635 -0.1913992 0.07879041
# [,7] [,8] [,9] [,10]
# [1,] 0.1617200 0.0964938 0.1630675 0.02336413
The function diag helps us extract the diagonal elements

from a matrix. (It is faster than using a double for-loop.)
+ 10
Assignment 1| Bootstrap
Problem 3.5
(c) Do the same as in (b) for Michaud’s resampled

weights (3.38) using B=500 bootstrap samples.
−1
PB ∗
ω̄ = B ω̂
b=1 b
Bootstrap, Algorithm:
1. Set up an index vector: index={1,2,…,no.of.obs (156)}
2. From the index set generated, we randomly draw m
samples (with replacement)
3. Suppose {2, 5, 1, 156, 2, 4, … 20} is drawn, we copy
different days of returns of ALL 10 stocks from the
original data set according to the sequence of indices
drawn. In this case, Day 2, 5, 1… of the original data
set will become the first, second, third, … row of our
new, resampled dataset.
+ 11
Assignment 1| Bootstrap
To implement it in R…
#R code:
n.sims <- 500
index <- seq(1,dim(returns)[1])
draws<-matrix(sample(index,size=n.sims*length(index)
,replace=TRUE),n.sims)
weights.Michaud<-matrix(NA, n.sims, no.of.stocks)
for (b in 1:n.sims){
portfolio<-returns[draws[b,],]
weights.Michaud[b,]<-t(Markowitz(portfolio,
target.return)$weights)}
apply(weights.Michaud,2,mean)
}
If you prefer to use MATLAB, you may try…

%MATLAB code:
index=randsample(1:no.of.obs,m,true)
%...
+ 12
Assignment 1| Plots
Problem 3.5
(d) Plot the estimated efficient frontier (by varying μ∗ over

a grid) that uses ( μ̂ , §ˆ ) to replace ( μ , § ) in Markowitz’s
efficient frontier.
We need to generate a sequence of target returns, plug

each of them in Markowitz and get one set of efficient
weights…
#R code:
mus <- seq(-0.01,0.015,length.out=100)
vol<-Markowitz(returns, mus)$vol.eff
plot(vol,mus,type="l", xlab= "Volatility", ylab="Return")
%MATLAB code:
mus=linspace(-0.01,0.015,100)
%...
plot(mus,vol)
hold on
+ 13
Assignment 1| Plots
Problem 3.5
(e) Plot Michaud’s resampled efficient frontier using

B = 500 bootstrap sample. Compare with the plot in (d).
#R code:
index<-seq(1, dim(returns)[1])
draws<-
matrix(sample(index,size=n.sims*length(index),replace=TRUE),n.sim
s)
weights.Bootstrap<-matrix(0, length(mus), dim(returns)[2])
for (b in 1:n.sims){
resampled.returns<-returns[draws[b,],]
weights.Bootstrap[,]<-weights.Bootstrap[,]
+Markowitz(resampled.returns, mus)$weights
}
weights.Bootstrap[,]<-weights.Bootstrap[,]/n.sims
mean.mu <- t(weights.Bootstrap[,]%*%mean(returns))
mean.sigma<-sqrt(diag(weights.Bootstrap[,]%*%
var(returns)%*%t(weights.Bootstrap[,])))
points(mean.sigma,mean.mu,type="l",col="Red")
+ 14
Assignment 1| Plots
Efficient frontiers obtained from different numbers of resampling:

Red=10, Blue=500 (Left); Plots of efficient frontiers obtained
from each resampled dataset (Right).
+ 15
Assignment 2
+ 16
Assignment 2
Problem 3.6
(a) Fit CAPM to the ten stocks. Give point estimates and
95% confidence intervals of α,β, the Sharpe index, and
the Treynor index.
We have to transform the interest rate from annual return

into monthly return so that it is compatible with the
monthly return of the stocks:
(1 + rmonthly )12 = 1 + rannual

#R code:
marketreturns[,2] <- (1+marketreturns[,2] / 100)^(1/12)-1
marketexcess <- marketreturns[,1]- marketreturns[,2]
stockexcess <- #similarly…
n.of.stocks <- #use dim(stockexcess)[?]

n.of.obs <- #...
+ 17
Assignment 2
The parameters in CAPM can be obtained by running
linear regressions lm(y~x) of the excess equity returns
on the excess market returns for each of the ten stocks
in the data set.
#R code:
for(i in 1:n.of.stocks){
llm <- summary(lm(stockexcess[,i] ~ marketexcess))
alpha.coef[i] <-
# try to use names(llm) and use llm$<…> to extract useful info.
alpha.sd[i] <- #...
beta.coef[i] <- #...
beta.sd[i] <- #...
rres[i,] <- #...
sharpe.coef[i] <- mean(stockexcess[,i])/sd(stockexcess[,i])
treynor.coef[i] <- mean(stockexcess[,i])/beta.coef[i]
#...
#names(llm)
+ 18
Assignment 2 | Delta Method

Because it is not easy to calculate
Var( μσ ) and Var( μβ ) ,
we use Delta Method to approximate these values.
What is Delta Method? According to Lai & Xing (2008), we

have on Page 59:
g(θ̂) − g(θ 0 ) −→D N (0, (∇g(θ̂))0 (−∇2 ln (θ̂))−1 (∇g(θ̂)))
… =_=’’
+ 19

In fact, you can view it as a way to obtain the variance
through Taylor series expansion.
Recall, if θ̂n ≈ θ0 , for any “smooth” function Φ(‧)
φ(θ̂n ) − φ(θ0 ) ≈ φ0 (θ0 )(θ̂n − θ0 ) + 12 φ00 (θ0 )(θ̂n − θ0 )2 + . . .
If we take variance on both sides and treat φ(n) (θ0 ) ≈ φ(n) (θ̂)
then,
Var(φ(θ̂)) ≈ [φ0 (θ̂)]2 Var(θ̂n − θ0 )

+ 20

If we let φ(μ, σ) = σ , and again use Taylor series
μ

expansion to expand the function, we will have
φ(μ̂, σ̂) ≈ φ(μ, σ) + φ01 (μ, σ)(μ̂ − μ) + φ02 (μ, σ)(σ̂ − σ) + . . .

where
∂ ∂
φ01 := ∂x φ(x, y) = y −1 φ02 := ∂y φ(x, y) = −xy −2 .
V ar(Ŝ) ≈ V ar {φ(μ, σ) + φ01 (μ, σ)(μ̂ − μ) + φ02 (μ, σ)(σ̂ − σ) + . . .}

2 2
= [φ01 (μ, σ)] V ar(μ̂) + [φ02 (μ, σ)] V ar(σ̂)
+2 [φ01 (μ, σ)φ02 (μ, σ)] Cov(μ̂, σ̂)
indep. 2 2
≈ [φ01 (μ̂, σ̂)] V ar(μ̂) + [φ02 (μ̂, σ̂)] V ar(σ̂)
= ...
+ 21
Assignment 2
The variance of the Treynor ratio can be computed in a
similar fashion. Notice that for this ratio, the covariance
term also vanishes1.
(b) Use the bootstrap procedure in Section 3.5 to

estimate the standard errors of the point estimates of
α,β,the Sharpe and Treynor indices.
#R code:
# This part will be very similar to part (c) in Problem 3.5. Make
# sure that you have set up matrices to store the estimated
# values. It is a combination of the bootstrap technique that you
# just learnt and the code for part (a) of Problem 3.6
1
The covariance between the mean of excess stock return and the beta estimate should also be 0.
+ 22
Assignment 2
Problem 3.6
(c) Test for each stock the null hypothesis α= 0.

(Skipped)
(d) Use the regression model (3.24) to test for the ten
stocks the null hypothesis α= 0.
This problem can be solved by using (3.26) and (3.28).
−1
Pn
V̂ = n i=1 (y i − α̂ − β̂xi )(y i − α̂ − β̂xi )0
and
n o−1
n−q−1 0 −1 Pn x̄
2
α̂ V̂ α̂ 1+ 2 ∼ Fq,n−q−1
q n1 t=1 (xt −x̄)
under H0.
+ 23
Assignment 2
The corresponding R code is shown as follows. Notice
that we can extract the residual from the regression
result –rres.
#R code:
alpha.level <- 0.05
#what is rres?
Vhat <- 1/n.of.obs * rres %*% t(rres) # Formula 3.26
nnumerator <- t(alpha.coef) %*% solve(Vhat) %*% alpha.coef
ddenominator <- 1 + mean(marketexcess)^2
/(var(marketexcess)*(n.of.obs-1)/n.of.obs)
dof <- n.of.obs-n.of.stocks-1
sstat <- dof/n.of.stocks * nnumerator / ddenominator
# Formula 3.28
fstat <- qf(1-alpha.level, n.of.stocks, dof)
print(sstat)
print(fstat)
+ 24
Assignment 2
Problem 3.6
(e) Perform a factor analysis on the excess returns of the

ten stocks. Show the factor loadings and rotated factor
loadings. Explain your choice of the number of factors.
Use factanal and varimax:
#R code:
ana2<-factanal(stockexcess, factors=2, rotation="none")
rot2<-varimax(loadings(ana2), normalize = TRUE)
#... Try factors=3,4… and discuss/argue.

+ 25
Assignment 2 | Testing
Problem 3.6
(f) Consider the model
rte = β1 1{t<t0 } rM
e e
+ β2 1{t≥t0 } rM + ²t ,
in which rte = rt − rf and rM e
= rM − rf are the excess
returns of the stock and the S&P500 index. The model suggests
that the βin the CAPM might not be a constant (i.e. β1 ≠ β2).
Taking February 2001 as the month t0, test for each stock the
number hypothesis that β1 = β2.
Hint:
Transform the model into
rte = (β1 − β2 )1{t<t0 } rM

e e
+ β2 rM + ²t ,
and test for β1 - β2.

+ 26
Assignment 2 | Change point

Problem 3.6
(g) Estimate t0 in (f) by least squares criterion that

minimises the residual sum of squares over the
parameters.
#R code:
changepoint <- function(i, t0)
{
x2 <- c(rep(0,t0-1), rep(1, n.of.obs-t0+1))
#set up the indicator function
llm <- summary(lm(stockexcess[,i] ~ marketexcess + ?))

pV <- llm$coef[3,4]
resSS <- #...
return(list(pV=pV, resSS=resSS))
}
apply(SSR,1,which.min) will be helpful for you to

locate which t0 gives you the smallest SSR for each stock.
+ 27
Principal Component Analysis (PCA)

+ 28
Principal Component Analysis

Geometric Interpretation
+ 29

Geometric Interpretation
+ 30

Principal Component Analysis (PCA) is a traditional
multivariate statistical technique for dimension or
variables reduction.
It is to find linear combinations of original variables such

that the information in the original data is preserved.
Before we go into the details of it, we first look at an

interesting application of PCA to identify the important
components of the term structure of interest rate.
The file “us-rate.dat” contains US semi-annualized

zero-coupon rates with maturities between 1M to 30Y,
monthly data from 1944 to 1992.
+ 31

Implementation in R:
#R code:
d<-read.table("us-rate.csv“, sep=“,”)
label<-
c("1m","3m","6m","9m","12m","18m","2y","3y","4y","5y","7y","10y",
"15y")
names(d)<-label
options(digits=2)
cor(d)
R printout:
cor(d) Æ
+ 32

First note that these 13 variables are highly correlated.
This implies that we can use only few variables (say 2 or
3) to represent this dataset without loss of large amount
of information.
A natural question to ask is which variables to use so that

they can contain the information in the dataset as much
as possible?
The answer is not from any of these original variables but

a linear combination of the original variables.
Refer to Prof Ying’s lecture and the textbook for the

derivation of the principal components.
+ 33

The reason for maximizing the variance of the
transformed variable y is that we want to preserve as
much information in x as possible. Conceptually,
variation represents information.
We want the variance of the transformed variable y as

large as possible, so that the original information in x is
preserved. The eigenvalue, which is equal to the variance
of y, indicates the amount of information being retained
in the PC.
PCA using R
Let us first perform PCA using R and the princomp()

function.
+ 34

#R code:
pca<-princomp(d,cor=T)
pca$loadings[,1:6]
pc1<-pca$loadings[,1]
pc2<-pca$loadings[,2] # save the loading of PC2
Å R printout:
pca$loadings[,1:4]
+ 35

We want to represent this us-rate dataset by using only
few variables instead of all 13 variables.
How many PC should we use? Before we answering this

question, first let use look at the variance or standard
deviation (s.d.) of these PC.
#R code:
s<-pca$sdev # save the s.d. of all PC to s

s # display s
+ 36

From the above output, we know that the s.d. and
variance of the 1st PC is 3.567 and 12.7239 respectively
and so on. The 1st PC explained 12.7239/13 = 97.88%
of the total variance. The 2nd PC explained 1.83%, the
3rd PC explained 0.19% and so on. If we only use the 1st
PC, this PC already preserved almost all (97.88%) the
information.
A plot of variance (called Scree plot) can help us

determine the suitable number of PC used. This plot
represents the information retained in each PC
graphically.
#R code:
screeplot(pca,type="lines")
+ 37
Scree Plot.
+ 38

Interpretation of PCs
1st PC represents a parallel shift of the

yield curve.
2nd PC agrees with the liquidity

preference theory. This theory states
that investors prefer to preserve their
liquidity and invest funds for short
period of time. This component is
termed as the tilt component of the yield
curve.
3rd PC can be interpreted as the

curvature of the yield curve.

Assignments Walkthroughs and R Demo: W4290 Statistical Methods in Finance - Spring 2010 - Columbia University

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Assignments Walkthroughs and R Demo: W4290 Statistical Methods in Finance - Spring 2010 - Columbia University

Enviado por

Direitos autorais:

Formatos disponíveis

+

 Principal Component Analysis (PCA) on R

(a) Compute the sample mean μ̂ and the sample

To change the working directory:

To input the data:

(a) Compute the sample mean μ̂ and the sample

Use mean and var to obatin the estimates:

In MATLAB, however, for a matrix A, var(A) and cov(A)

(b) Assume the monthly target return is 0.3% and that

To facilitate the computation, we’ll write a function.

 We can therefore write a function, say Markowitz, to

 Exact solution can be found on P. 70-71 of Lai & Xing

 The function diag helps us extract the diagonal elements

(c) Do the same as in (b) for Michaud’s resampled

If you prefer to use MATLAB, you may try…

(d) Plot the estimated efficient frontier (by varying μ∗ over

We need to generate a sequence of target returns, plug

(e) Plot Michaud’s resampled efficient frontier using

Efficient frontiers obtained from different numbers of resampling:

We have to transform the interest rate from annual return

(1 + rmonthly )12 = 1 + rannual

n.of.stocks <- #use dim(stockexcess)[?]

Assignment 2 | Delta Method

Var( μσ ) and Var( μβ ) ,

we use Delta Method to approximate these values.

 What is Delta Method? According to Lai & Xing (2008), we

g(θ̂) − g(θ 0 ) −→D N (0, (∇g(θ̂))0 (−∇2 ln (θ̂))−1 (∇g(θ̂)))

Assignment 2 | Delta Method

Recall, if θ̂n ≈ θ0 , for any “smooth” function Φ(‧)

φ(θ̂n ) − φ(θ0 ) ≈ φ0 (θ0 )(θ̂n − θ0 ) + 12 φ00 (θ0 )(θ̂n − θ0 )2 + . . .

Var(φ(θ̂)) ≈ [φ0 (θ̂)]2 Var(θ̂n − θ0 )

Assignment 2 | Delta Method

φ(μ̂, σ̂) ≈ φ(μ, σ) + φ01 (μ, σ)(μ̂ − μ) + φ02 (μ, σ)(σ̂ − σ) + . . .

V ar(Ŝ) ≈ V ar {φ(μ, σ) + φ01 (μ, σ)(μ̂ − μ) + φ02 (μ, σ)(σ̂ − σ) + . . .}

 (b) Use the bootstrap procedure in Section 3.5 to

(c) Test for each stock the null hypothesis α= 0.

This problem can be solved by using (3.26) and (3.28).

(e) Perform a factor analysis on the excess returns of the

Use factanal and varimax:

#... Try factors=3,4… and discuss/argue.

(f) Consider the model

rte = (β1 − β2 )1{t<t0 } rM

and test for β1 - β2.

Assignment 2 | Change point

(g) Estimate t0 in (f) by least squares criterion that

llm <- summary(lm(stockexcess[,i] ~ marketexcess + ?))

apply(SSR,1,which.min) will be helpful for you to

Principal Component Analysis (PCA)

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

 It is to find linear combinations of original variables such

 Before we go into the details of it, we first look at an

 The file “us-rate.dat” contains US semi-annualized

Principal Component Analysis

Principal Component Analysis

 A natural question to ask is which variables to use so that

 The answer is not from any of these original variables but

 Refer to Prof Ying’s lecture and the textbook for the

Principal Component Analysis

 We want the variance of the transformed variable y as

 Let us first perform PCA using R and the princomp()

Principal Component Analysis (PCA) on R

We can therefore write a function, say Markowitz, to

Exact solution can be found on P. 70-71 of Lai & Xing

The function diag helps us extract the diagonal elements

What is Delta Method? According to Lai & Xing (2008), we

(b) Use the bootstrap procedure in Section 3.5 to

It is to find linear combinations of original variables such

Before we go into the details of it, we first look at an

The file “us-rate.dat” contains US semi-annualized

A natural question to ask is which variables to use so that

The answer is not from any of these original variables but

Refer to Prof Ying’s lecture and the textbook for the

We want the variance of the transformed variable y as

Let us first perform PCA using R and the princomp()

How many PC should we use? Before we answering this

A plot of variance (called Scree plot) can help us