Você está na página 1de 29

Chapter 1 of Asset Pricing by John Cochrane:

Derivations and Thoughts


Kevin Egan (kevin.egan@gmail.com)

version 0.0.1 (20180129)

Introduction
The consumption capital asset pricing model (CCAPM) and related properties introduced in
the first chapter of Asset Pricing (Revised Edition) by John Cochrane [1] are fascinating and
powerful. However, as someone fairly new to the material there were a number of times where
I wanted to see more intermediate derivations or explanations for various claims. These are
the (undoubtedly imperfect) notes I made as as I went through the material and attempted
to derive the equations for myself. I would guess that it would be most useful for a new
reader to read a titled subsection of the book, think about the concepts and go through as
much of the math as you want to yourself, then finally refer to the relevant section in these
notes for any additional derivations or comments. I have relegated my two biggest issues
with the chapter’s presentation to appendices at the end. This is an early draft so there are
likely to be typos and mistakes in analysis. Each of the subsection in the notes matches the
corresponding subsection of the revised edition of the book (where I have also numbered the
sub-subsections).

Note that as of October 2017 Prof. Cochrane also very generously posted additional
materials including detailed notes, problem sets and video lectures which you can find at the
following URLs:
https://faculty.chicagobooth.edu/john.cochrane/teaching/asset_pricing.htm
https://faculty.chicagobooth.edu/john.cochrane/teaching/35150_advanced_investments/
https://faculty.chicagobooth.edu/john.cochrane/teaching/35904_Asset_Pricing/

The first chapter is available for free here:


https://assets.press.princeton.edu/chapters/s7836.pdf

And the online course is currently run by the Canvas network:


https://www.canvas.net/browse/uc-booth/courses/asset-pricing

If you search for answers to the problem sets you may be able to find those but they are
not officially posted by Prof. Cochrane.

1
1 Notes for the first chapter
I would highly recommend reading the 35150 “Week 5 Asset Pricing Theory” notes which
are currently here:

https://faculty.chicagobooth.edu/john.cochrane/teaching/35150_advanced_investments/
theory_notes.pdf

The first thing is to look over the description of the payoff xt+1 and make sure you
understand the concept of a payoff function that depends on possible future states of the
world (the start of Chapter 3 in the book also discusses this). When evaluating how much to
pay for an asset with payoff xt+1 we use our current knowledge at time t to predict states of
the world at time t + 1, then we can see how the payoff reacts in these predicted states. One
example is evaluating an insurance asset. As an example lets say there are 2 futures states
in a world where we are optimizing our candy bar consumption: in state 1 our stockpile of
candy bars is safe (we predict this is a 99% chance), and in state 2 our entire stockpile of
candy bars has been robbed. If we buy an asset that has no payoff in state 1 but has a
hefty payoff in state 2 we can smooth out our candy bar consumption stream by protecting
ourself in case of robbery. This asset acts like insurance which smooths our consumption
stream. Because people like to have stable consumption a rational consumer might purchase
this asset even if it loses money on average (just as rational people purchase insurance). If
we were evaluating a second asset that had a random 1% chance of winning lots of candy
bars but was not correlated with whether or not we got robbed that second asset would
be much less attractive (a lottery ticket is less attractive than insurance). And finally we
could evaluate a third asset that always paid a small amount when we were not robbed
and paid out nothing or negative values when we were robbed. This last asset is the most
risky (it exacerbates large drops in our possible future consumption). If all assets had the
same average payout we would be willing to pay the most for the first asset that acted like
insurance and the least for the last asset which was the most risky.
When we measure covariance between the things like future payoff xt+1 and future con-
sumption ct+1 we are essentially tabulating all of the states of the world and seeing how the
payoff matches up with our expected consumption (the covariance for insurance assets will
be negative because when our consumption dips the insurance pays off and vice versa).
There are two questions in the extras section of the notes near the bottom that I think
are useful to keep in mind during the first chapter of the book (briefly summarized below):

1. Question: The chicken and the egg. Does consumption determine asset returns, or do
asset returns determine consumption?
(a) Answer: in general both. Individual investors see asset prices and payoffs as fixed,
and our basic equation determines their consumption. To the economy, however, the equa-
tion may determine asset prices given consumption... [See notes for different examples of
endowment economy vs. linear technology vs. reality where the technology of production
determine what inputs drives changes in a macro economy. This is also discussed in Chapter
2 of the book.]

2
6. Question: you slipped in to talking about economy-wide average consumption, not
individual consumption. What’s up with that?
Answer: Right. There is a “theory of aggregation” that lets us do this. Here’s what
needs to be proved: that the average consumption across people responds to market prices
just as if there is a single consumer with “average” risk aversion γ and discount rate β doing
the choosing. Under some assumptions, it’s true... [I wish the book had stated this more
clearly and had given some kind of reference to material that explores these assumptions.
This is also discussed in Appendix B.]

1.1 Basic Pricing Equation


One quibble I have with the presentation is mixing the use of random variables such as the
optimal amount of an asset a consumer buys ξ and functions of those random variables such
as current consumption ct = et − pt ξ. Mixing these two things together can make us think
that two variables are independent when they are actually linked (such as ct , ct+1 and pt all
depending on the units purchased ξ). This in turn can lead to us making inferences about
equations that I don’t feel are mathematically robust (see Appendix A).
Here is the full derivation of the utility optimization where we keep ξ as a variable and
write out all other terms as functions of ξ:

max u(ct (ξ)) + Et [βu(ct+1 (ξ))] (1)


ξ

ct (ξ) = et − pt ξ (2)
ct+1 (ξ) = et+1 + xt+1 ξ (3)

Setting the derivative of what we want to optimize to zero to find critical points:

∂ h i
0= u(ct (ξ)) + Et βu(ct+1 (ξ)) (4)
∂ξ
∂ h i
0= u(et − pt ξ) + Et βu(et+1 + xt+1 ξ) (5)
∂ξ
h i
0 = u0 (et − pt ξ)(−pt ) + Et βu0 (et+1 + xt+1 ξ)(xt+1 ) (6)
h i
pt u0 (et − pt ξ) = Et βu0 (et+1 + xt+1 ξ)xt+1 (7)
h βu0 (e + x ξ)  i
t+1 t+1
pt = Et x t+1 (8)
u0 (et − pt ξ)
h βu0 (c )  i
t+1
pt = Et x t+1 (9)
u0 (ct )
pt = Et [mt+1 xt+1 ] (10)

Eqn 9 matches equation 1.2 in the book. If we use a simple utility function u(c) = ln(c)
with u0 (c) = 1/c then we get:

3
h β(e − p ξ)  i
t t
pt = Et xt+1 (11)
et+1 + xt+1 ξ
But from here it is difficult to isolate ξ, the amount of goods we would purchase at time t.
The book says that that it stops short of a solution putting independent (exogenous) variables
on one side and functions or variables that are dependent on other values (endogenous) on
the other side, and I believe this is because Eqn 11 does not have an easy solution. In my
opinion thinking of consumption values ct and ct+1 as variables is often convenient, but it
is helpful to realize that the two variables are coupled by the ξ variable (more consumption
now often leads to less later and vice versa).
NOTE 1: This is not explicitly stated in the book but I believe that in Eqn 3 et+1 is
a random variable that depends on the future state of the world, whereas et is a known
scalar for immediate planned consumption (similar to ct and ct+1 ). This makes it so that
our default future consumption can have non-zero variance, which in turn makes assets like
insurance more valuable. See Section 1.4.3 in these notes for more details.
NOTE 2: Eqn 9 can be a little misleading because it can look like we can compute pt by
plugging in values on the right. However, according to our original assumptions pt is actually
fixed at time t, and Eqn 8 shows that pt appears on both the left and right hand side of the
equation (in Eqn 9 pt is swallowed up inside of ct ). See Appendix A for more details.
NOTE 3: I believe that we are assuming that consumption is non-negative, and further-
more this means we should have our utility functions have a −∞ limit as they approach
zero, which means the power utility model should have γ ≥ 0. Many utility model models
including the power utility model doe not work with negative consumption values. I think
this is important because it means that either our utility function needs to go to −∞ as
we approach zero consumption to make sure we always choose to consume a small positive
amount, or the optimization we are doing for ξ needs to be a constrained optimization to
make sure that ct and ct+1 are never negative. The common power utility model that we
use trends to −∞ as we approach zero consumption when we set γ ≥ 0, so in my mind
γ ≥ 0 is a hard constraint unless we want to verify for a given problem that all future states
have non-negative consumption or we are willing to run a constrained optimzization instead
of using the equilibrium equations. This also means that the default current and future
consumption, et and et+1 respectively, must be non-negative.
NOTE 4: I believe that our assumption about the concavity of the utility requires that
γ ≥ 0 (which is a looser bounds than the previous item). To meet our assumption that
the u(c) function is concave the second derivative u00 (c) = −γc−γ−1 must be non-positive.
Assuming for the moment consumption c is positive then c−γ−1 will be positive, and therefore
to keep u00 (c) ≤ 0 we must have a non-negative γ value so that −γ keeps u00 (c) non-positive.
If γ is positive as consumption As c approaches 0 from above for positive γ we have u00 (c) →
−∞, which keeps the property that u00 (c) ≤ 0 for c ≥ 0. If we use negative γ as c approaches
0 from above u00 (c) → ∞ and our concavity assumption breaks. If γ is zero then u00 (c) = 0
no matter what c is. So to meet our concavity assumption that u00 (c) ≤ 0 I believe we must
constrain γ ≥ 0.
NOTE 5: Given that we are defining u(c) as an increasing function that is concave it
would be interesting if you could prove that the multi-period utility expression in Eqn 1

4
can be solved using a simple strategy such as “as long as my overall utility increases keep
buying small amounts of the asset” (essentially doing a dumbed down version of gradient
descent that consumer’s can intuitively do in their head). Showing that relatively simple
strategies do find the optimal overall utility would strengthen the argument that a broad
range of consumers across different markets can easily employ these kinds of optimizations
when they make decisions about consumption.
NOTE 6: It is a little strange to me that in the introduction the book says “Marginal
utility, not consumption, is the fundamental measure of how you feel”. I would argue that
utility is the fundamental measure of you feel. When we compute the optimal amount of
an asset to own we optimize for utility instead of marginal utility. Marginal utility is a very
important tool and it dictates choices and behavior, but in the end that very important tool
is used to maximize what we care about, which is utility.

1.2 Marginal Rate of Substitution/Stochastic Discount Factor


Equation 1.5 from the book is defined in this subsection but information from later sections
of the book is needed to derive it. The book explains in Sec 1.3 that capital R variables
represent gross return (something like 1.05) while lowercase r variables express net return,
something like 0.05 (R − 1 = r, or sometimes ln(R) = r). The later section also explains
that “We can think of a return as a payoff with price one”, giving:

1 = E[mt+1 Rt+1 ] (12)

In Sec 1.4 the book points out that the underlying asset for the risk-free rate has no
variability so the rate should be known ahead of time:

1 = E[mt+1 ]Rf (13)


f
R = 1/E[mt+1 ] (14)

Farther down there is another equation:

pit = Et [xit+1 ]/Ri (15)

where Ri is a risk adjusted discount factor that appears to be a scalar intsead of a random
variable. Note that the book says that riskier assets are “often valued” using this equation,
but that Eqn 10 (pt = Et [mt+1 xt+1 ]) is a generalization. The way I read this was that we
should stick with using Eqn 10 but that it is useful to know Eqn 15 because it is commonly
used in practice. Eqn 15 does not take into account the covariance of the payoff and stochastic
discount factor so it is not as accurate, but could be a useful approximation if we don’t have
covariance information.
The book mentions how mt+1 is sometimes being called the pricing kernel. This concept
of a kernel refers to an integral transform T with kernel K where the transformed function
(T f ) takes values in the space of u while feeding the function f values in the space of t.

Z
(T f )(u) = K(t, u)f (t)dt (16)
S

5
So for the pricing equation we can define a transform P that takes the future payoff
function xt+1 (s) (defined for any future state s) and the pricing kernel mt+1 (s, β) and does
an integration to compute the current equilibrium price pt (β) for any value of the subjective
discount factor β.1 Expressing this as an integral over all predicted future states S (given
what we know at time t) we have the following:
Z
(P xt+1 )(β) = pt (β) = mt+1 (s, β)xt+1 (s)ds (17)
S

The comments about the stochastic discount factor mt+1 also being called a “change
of measure” and “state-price density” are because we can consider mt+1 as a function that
transforms from real probabilities to “risk-neutral” probabilities where states of the world are
weighted by how much they matter to us (see Chapter 3 for more details). In this view mt+1
changes the measure from real probabilities to risk-neutral probabilities and then we simply
integrate future payoffs xt+1 within the measure of risk-neutral probabilities. Future states
with low consumption (and therefore high marginal utility) will tend to be overweighted
when using risk-neutral probabilities relative to future states that have high consumption.
This is because our concave utility function penalizes an incremental drop in consumption
of size ∆ more than it rewards an incremental boost in consumption of size ∆.
NOTE 1: I don’t like the pricing kernel formulation because as mentioned earlier we
originally assumed pt was a constant. Even more importantly Eqn 17 says that we can
define a pricing kernel which computes pt , but the pricing kernel itself uses pt so there is a
hidden feedback loop in this equation (the pricing kernel mt+1 uses ct which uses pt ). To use
the pricing kernel formulation it would seem we would have to guess one version of pt for the
kernel, then compute a new version of pt , then maybe plug that back into the kernel, keep
iterating and hope we converge to something. See Appendix A for more details.

1.3 Payoffs, Prices and Notation


The term stationary essentially means that the distribution of a random variable does not
change over time. As that variable is sampled over and over there will be different values
each time (as there are different returns each day in the stock market) but the distribution
will remain stable.
The note about using the price-dividend ratio pt /dt to “look at prices but still examine
a stationary variables” does not make sense to me. Is there empirical evidence that price-
dividend ratios do not predictably trend across time that would make us believe that this
is a stationary variable? And how is this measure useful if it goes to infinity for any period
when the dividend value is zero? It would be useful to look at a case where we benefit from
using price-dividend ratio instead of returns, because currently I don’t see the value.
The expression zt = a − b(pt /dt ) is an example of a simple linear model with intercept a
and slope b. The a and b values could be generated by running a linear regression (future
returns regressed against price-dividend ratios in this example).
I’m not completely sure but I believe the zt variable represents shares invested in a specific
stock. So we could express the total dollar value of a given portfolio’s MSFT position is
1
We could more generally define pt over pairs of discount factors and risk aversion exponents {β, γ}.

6
the number of MSFT shares zt multiplied by the per-share dollar value pt of MSFT stock
(similarly xt+1 would be the per-share dollar future payoff).

1.4 Classic Issues in Finance


1.4.1 Risk-Free Rate
To derive the risk-free interest rate with zero uncertainty we use u0 (c) = c−γ , and the fact
that if there’s no uncertainty we can write the expectation of ct+1 as a constant instead of a
random variable.

Rtf = 1/Et [mt+1 ] (18)


  0 
f u (ct+1 )
Rt = 1/Et β (19)
u0 (ct )
  −γ 
c
Rtf = 1/Et β t+1 (20)
c−γ
 γ t
1 ct+1
Rtf = (21)
β ct

Lets derive equation 1.7 from the book. First if you are interested we can prove that for
a normally distributed variable z with mean µ and variance σ 2 , when that variable is used in
2
an exponential function the expected value is E[ez ] = eµ+σ /2 . It is convenient to represent
z using a “standard normal” y that has mean 0 and variance 1, leading to z = µ + yσ (the
standard deviation σ scales the y variable). Then we can recast this problem by saying that
to find the expectation we are integrating ez = eµ+σy = g(y) over variable y which has a
simple “standard normal” probability density f (y). The law of the unconscious statistician
says that we can compute the expectation using Eqn 22:
Z ∞
E[g(y)] = f (y)g(y)dy (22)
−∞
Z ∞ 
1 −y2 /2
eµ+yσ dy

E[g(y)] = √ e (23)
−∞ 2π
Z ∞
z µ 1 2
E[e ] = e √ eyσ−y /2 dy (24)
2π −∞
Z ∞
µ+σ 2 /2 1 2 2
=e √ e−(y −2yσ+σ )/2 dy (25)
2π −∞
Z ∞ 
µ+σ 2 /2 1 −(y−σ)2 /2
=e √ e dy (26)
−∞ 2π
2
= eµ+σ /2 (27)

We complete the square in Eqn 25 and rely on the fact that an offset normal distribution
integrates to 1 in Eqn 27.
If consumption growth is log-normally distributed that means that the log of consumption
change ∆ ln ct+1 = ln ct+1 −ln ct is normally distributed. For the below equations we use σt2 [·]

7
to represent the expected variance of a random variable at time t. Note that for constant a
and random variable X we have σt2 [aX] = a2 σt2 [X].

"  −γ #
c t+1
Rtf = 1/Et β (28)
ct
Rtf = 1/ e−δ Et e(∆ ln ct+1 )(−γ)
 
(29)
h 2 2
i
Rtf = 1/ e−δ e−γEt [∆ ln ct+1 ]+(γ /2)σt [∆ ln ct+1 ] (30)
h i−1 
f −δ−γEt [∆ ln ct+1 ]+(γ 2 /2)σt2 [∆ ln ct+1 ]
ln(Rt ) = ln e (31)

rtf = δ + γEt [∆ ln ct+1 ] − (γ 2 /2)σt2 [∆ ln ct+1 ] (32)

NOTE 1: The comments regarding equations 1.6 and 1.7 in the book seem to be two
of the first places where we have subtly transitioned from micro to macro economics. The
book has subtly switched from talking about a stochastic discount factor mt+1 and risk-free
rate being defined per consumer, to discussing a single risk-free rate that is defined acorss
an entire economy. See Appendix B for further comments.
NOTE 2: Even with really simple equations like Eqn 21 I feel that some of the inferences
that are made are made in the book based on equilibrium equations are not convincing. See
Appendix A for further comments.

1.4.2 Risk Corrections


I think it is useful to dig down into what it means to measure the covariance between
future payoff (xt+1 ) and discounted rate of future consumption (βu0 (ct+1 )). This covariance
measurement is done over the space of all predicted future states S given what we know at
time t. If we expand out equation 1.10 from the book we get the following. Note that this
equation only holds when the purchasing choice ξ and consumption choices ct and expected
ct+1 are at their optimal values.

Et [xt+1 ] cov[βu0 (ct+1 ), xt+1 ]


pt = + (33)
Rf u0 (ct )
Et [xt+1 ] β
= f
+ 0 (Et [u0 (ct+1 )xt+1 ] − Et [u0 (ct+1 )]Et [xt+1 ]) (34)
R u (ct )
Z Z Z 
Et [xt+1 ] β 0 0
= + 0 u (ct+1 (s))xt+1 (s)ds − ( u (ct+1 (s))ds)( xt+1 (s)ds) (35)
Rf u (ct ) S S S

The book has a passage stating “Marginal utility u0 (ct ) declines as c rises. Thus, an
asset’s price is lowered if its payoff covaries with consumption.” However, the second part of
that sentence falls into a category of inferences that I do not find convincing (the first part
of the sentence comes from the concavity of the utility function). I explain my issues with
this specific inference in Appendix A.

8
When the book says “For small (marginal) portfolio changes, the covariance between
consumption and payoff determines the effect of adding a bit more of each payoff on the
volatility of consumption”. This claim is based on noticing that in the limit as ξ goes to 0
the ξ 2 term will become infinitely smaller than the covariance term, making the covariance
term the only term that matters for marginal portfolio changes.

σ 2 [c + ξx] = σ 2 [c] + 2ξcov[c, x] + ξ 2 σ 2 [x] (36)

NOTE 1: I would personally add that when we talk about covariance between an asset’s
payoff and consumption that this applies to a specific consumer and their specific consump-
tion plan. This may not be emphasized in the book so that these ideas can transfered to
macro economics later. See Appendix B for more details.

1.4.3 Idiosyncratic Risk Does Not Affect Prices


Lets break down what is happening with the projection of future payoff xt+1 onto the stochas-
tic discount factor mt+1 in Eqn 37.

xt+1 = proj(xt+1 |mt+1 ) + t+1 (37)

Lets go through an example (shown in Figure 1) of geometric projection using vectors


before we talk about projection of random variables. There are a few properties that hold in
Figure 1 that also hold for Eqn 37. First, as the equation states clearly xt+1 can be split into
a projected component and a t+1 component. Second, the projection onto mt+1 is simply
mt+1 multiplied by some scalar value. Third, t+1 is perpendicular to mt+1 (we will discuss
what that means with random variables later).

Figure 1: This shows vector analogs of the various quantities for Eqn 37. Note that mt+1 is
the vector that spans from the lower left corner all the way to the far right, accounting for
both blue and green colors.

The fraction on the right side of Eqn 38 will be a scalar value that scales the magnitude
of the mt+1 random variable. The expectations computed in the numerator and denominator
are each scalars that are equivalent to dot products in linear algebra. If we think of the set
of all predicted future states S we can write out the expectation as integrals in Eqn 39.

9
 
Et (mt+1 xt+1 )
proj(xt+1 |mt+1 ) = mt+1 (38)
Et (m2t+1 )
R
mt+1 (s)xt+1 (s)ds
proj(xt+1 |mt+1 ) = S R mt+1 (39)
S
mt+1 (s)2 ds

If there were three equally weighted future states we could think of a and b being random
variables with three states, being linear algebra vectors a and b with three dimensions, or
three dimensional geometric vectors. The dot product operation (which is the building block
for projection) in each domain is written as follows: E[ab] ↔ aT b ↔ (a dot b). This connects
the intuitive geometric representation with the vector and probabilistic representation. When
two random variables are orthogonal it means their dot product is zero, whereas in geometric
space it means the vectors are perpendicular.
To verify that the expected price of the residuals is zero we have:

Et [mt+1 t+1 ] = Et [mt+1 (proj(xt+1 mt+1 ) − xt+1 )] (40)


  
Et [mt+1 xt+1 ]
= Et mt+1 mt+1 − xt+1 (41)
Et [m2t+1 ]
Et [mt+1 xt+1 ]m2t+1
 
= Et − mt+1 xt+1 (42)
Et [m2t+1 ]
Et [mt+1 xt+1 ]
= Et [m2t+1 ] − Et [mt+1 xt+1 ] (43)
Et [m2t+1 ]
= Et [mt+1 xt+1 ] − Et [mt+1 xt+1 ] (44)
=0 (45)

The phrase “projection means linear regression without a constant” was a little unclear
to me. I would say that projecting x onto m is the same as computing a linear regression
without a constant (aka intercept) that lets us predict x given values of m, then using the
computed slope to linearly scale m so that it is as close to the original x as possible. Note
that following this process results in all elements of m being linearly scaled by the same
constant. Using vector notation for regression we have x = mβ +  where β is the scale or
slope that we use to predict values of x based on m, and  is the error term. After minimizing
the squared error term (see other resources for linear regression derivation) we can solve for
slope β as β = (mT m)−1 mT x and the predicted values are mβ. You can see that mβ is the
geometric equivalent to the right hand side of Eqn 38.
This chapter mentions “APT” which refers to the arbitrage pricing theory which assumes
that markets are efficient and there is no ability to arbitrage asset prices (aka make money
with zero risk).
NOTE 1: This chapter already says that idiosyncratic risk is based on the stochastic
discount factor, but I think think it’s worth emphasizing that different consumer’s can have
different discount factors. That means that whether an asset has idiosyncratic risk can be
evaluated differently by each different consumer, it is not a fundamental property of the
asset.

10
NOTE 2: My earlier note in Section 1.1 about making sure we define default future
consumption et+1 (see Eqn 3) as a random variable instead of a simple scalar is critical now
that we have started discussing assets that act like insurance. If et+1 is a random variable
then even if purchase zero of the asset in question our planned consumption may change
based on future events (eg. if a tornado hits our house our consumption goes down). This
gives value to purchasing an insurance asset to smooth out our future consumption ct+1 (eg.
tornado insurance). If we assume that et+1 is a fixed scalar then there is no point in buying
any type of insurance asset because our future consumption by default has zero variance
across all possible future states.
NOTE 3: Making a function p that gives the current price for a given future payoff
xt+1 and then using that function to compute p(proj(xt+1 |mt+1 )) again appears to me to be
circular logic. We use the fixed price pt at time t (assumed to be a constant) to compute mt+1 ,
and then mt+1 is used to compute the projection which is then passed to the new funtion p
which computes the price at time t. Now we have two possibly inconsistent representations
for the price at time t: function p(x) and the constant pt used to define mt+1 . See Appendix A
for more details.

1.4.4 Expected Return-Beta Representation


From equation 1.14 in the book we have:

cov[Ri , mt+1 ]
  
i f var(mt+1 )
Et [R ] = R + − (46)
var(mt+1 ) Et [mt+1 ]
i f
Et [R ] = R + βi,m λm (47)

The book says that βi,m is the regression coefficient computed using cov(Ri , mt+1 )/var(mt+1 ).
Both Eqn 46 and Eqn 38 said that they related to regression, but they are slightly different.
For Eqn 46 the numerator used to calculate the linear regression coefficient is cov(Ri , mt+1 ) =
E[Ri mt+1 ] − E[Ri ]E[mt+1 ], whereas in Eqn 38 the numerator was E[mt+1 xt+1 ] without sub-
tracting any E[mt+1 ]E[xt+1 ] term. The reason we have two different numerators to compute
regression coefficients is because for Eqn 38 we were doing projection which is linear regresion
“without a constant”. In the return-beta representation we are calculating the slope of the
linear regression assuming that there will be an intercept term in the model (from Eqn 46
you can see that the intercept is Rf ).
The book suggests taking the Taylor expansion of Eqn 46, which I believe means taking
the Taylor expansion of ex around 0 to get ex ≈ 1 + x. In the 35150 “Week 5 Asset Pricing
Theory” notes he derives a result in his notes on “Risk and betas” with an additional
approximation that Rf ≈ 1. For this case we define ∆ct+1 = ln(ct+1 /ct ), and as stated in
the book λ∆c = γvar[∆ct+1 ].

11
 −γ
ct+1
mt+1 = β (48)
ct
−γ
mt+1 = e−δ eln(ct+1 /ct ) (49)
mt+1 = e−δ e−γ∆ct+1 (50)
mt+1 ≈ 1 − δ − γ∆ct+1 (51)
cov[Ri , mt+1 ]
  
i f var[mt+1 ]
Et [R ] = R + − (52)
var[mt+1 ] Et [mt+1 ]
i
 
cov[R , mt+1 ]
Et [Ri ] = Rf − (53)
Et [mt+1 ]
Et [R ] = R − R cov[Ri , mt+1 ]
i f f
(54)
i f f i
Et [R ] ≈ R − R cov[R , 1 − δ − γ∆ct+1 ] (55)
Et [Ri ] = Rf − Rf (−γ)cov[Ri , ∆ct+1 ] (56)
Et [Ri ] = Rf + cov[Ri , ∆ct+1 ](Rf γ) (57)
cov[Ri , ∆ct+1 ]
 
i f
Rf γvar[∆ct+1 ]

Et [R ] = R + (58)
var[∆ct+1 ]
cov[Ri , ∆ct+1 ]
 
i f
Et [R ] ≈ R + (γvar[∆ct+1 ]) (59)
var[∆ct+1 ]
Et [Ri ] = Rf + βi,∆c λ∆c (60)

I’m not completely sure why we use the Rf ≈ 1 approximation when defining λ∆c ,
since Eqn 60 already refers to an Rf value either defined externally or by estimating Rf =
1/Et [mt+1 ].
NOTE 1: The book states “Notice that the coefficient λm is the same for all assets i”.
I would expand this to say that for a specific consumer’s optimal consumption plan (which
affects the stochastic discount factor mt+1 ) the λm variable is constant across all assets. See
Appendix B for discussion of micro vs. macro economic views of these equations.

1.4.5 Mean-Variance Frontier


A more detailed derivation of equation 1.17 in the book (using the fact that the correlation
ρ is always bounded such that |ρ| ≤ 1 as proven in Eqn 84, and that E[XY ] = E[X]E[Y ] +
ρ[X, Y ]σ[X]σ[Y ] by taking cov[X, Y ] = E[XY ] − E[X]E[Y ], rearranging and scaling to use
correlation instead):

12
1 = Et [mt+1 Ri ] (61)
i i
1 = Et [mt+1 ]Et [R ] + ρmt+1 ,Ri σt [mt+1 ]σt [R ] (62)
1 σt [mt+1 ]
= Et [Ri ] + ρmt+1 ,Ri σt [Ri ] (63)
Et [mt+1 ] Et [mt+1 ]
σt [mt+1 ]
Rf = Et [Ri ] + ρmt+1 ,Ri σt [Ri ] (64)
Et [mt+1 ]
σt [mt+1 ]
Rf − Et [Ri ] = ρmt+1 ,Ri σt [Ri ] (65)
Et [mt+1 ]
σt [mt+1 ]
|Rf − Et [Ri ]| ≤ σt [Ri ] (66)
Et [mt+1 ]
Et [Ri ] − Rf

≤ σt [mt+1 ] (67)
σt [Ri ] Et [mt+1 ]

Eqn 67 shows a bound on the Sharpe ratio. Rearranging terms of Eqn 64 gives equation
1.18 in the book:
σt [mt+1 ]
Et [Ri ] = Rf − ρmt+1 ,Ri σt [Ri ] (68)
Et [mt+1 ]

Similarly to Et [·] I have denoted the expected standard deviation of a random variable
at time t as σt [·]. Note that the correlation ρmt+1 ,Ri is also computed using our knowledge
at time t.
I found the phrase “we can go beyond perfect correlation” to be slightly confusing. I
believe it means that there are other derivations that we can use that do not rely on the
correlation between mt+1 and Ri , but rather we can examine the mean variance frontier by
generating assets focusing only on mt+1 .
It is important to remember that the risk-free rate Rf is a scalar equal to 1/Et [mt+1 ],
whereas other returns are random variables with random payoffs and a price of 1. In other
words the standard deviation of Rf will always be 0, whereas for most other returns the
standard deviation will be positive.
The book uses the fact that perfect correlation or anti-correlation (correlation coefficient
of 1 or -1) between random variables X and Y implies that for some a and b we have
Y = aX + b. Because correlation looks at demeaned results and accounts for the scale of X
and Y this may seem obvious, but the proof is written below in case this is useful.
First we show that two numbers have a correlation of 1 or -1 if and only if cov[X, Y ]2 =
var[X]var[Y ]. Peaking ahead Eqn 81 can be used to prove that the correlation coefficient
always lies between -1 and 1. First the case when the absolute value of correlation is equal

13
to 1:
|ρX,Y | = 1 (69)
cov[X, Y ]
| |=1 (70)
σ[X]σ[Y ]
|cov[X, Y ]| = σ[X]σ[Y ] (71)
cov[X, Y ]2 = var[X]var[Y ] (72)
(73)
Now the case where correlation is less than 1:
|ρX,Y | < 1 (74)
cov[X, Y ]
| |<1 (75)
σ[X]σ[Y ]
|cov[X, Y ]| < σ[X]σ[Y ] (76)
cov[X, Y ]2 < var[X]var[Y ] (77)
The next step is to show that cov[X, Y ]2 = var[X]var[Y ] if and only if X is linearly related
to Y . This involves computing the variance of dX + Y , creating a quadratic equation and
examining the discriminant. To start we know that the variance is always positive.
0 ≤ var[dX + Y ] (78)
0 ≤ var[X]d2 + 2cov[X, Y ]d + var[Y ] (79)
The last equation is a quadratic equation for d, where Q(d) = ad2 + bd + c, and Q(d) ≥ 0.
From this we know the discriminant is non-positive, 0 ≥ b2 − 4ac.
0 ≥ 4cov[X, Y ]2 + 4var[X]var[Y ] (80)
cov[X, Y ]2 ≤ var[X]var[Y ] (81)
Taking a short detour we an prove that the correlation is always bounded between -1 and 1:
cov[X, Y ]2
≤1 (82)
var[X]var[Y ]

cov[X, Y ]
≤1 (83)

p
var[X]var[Y ]
|ρ[X, Y ]| ≤ 1 (84)
The equality case where cov[X, Y ]2 = var[X]var[Y ] can only happen when the discriminant
is exactly zero, meaning var[dX + Y ] = 0 for some value of d. When the variance for an
expression is zero the value of the expression must be equal to some constant n as shown
below.
var[dX + Y ] = 0 (85)
dX + Y = n (86)
Y = −dX + n (87)

14
Connecting these two logical points we can now say that two variables are perfectly correlated
(correlation coefficient of 1 or -1) if and only if they are linearly related to each other.
Note that all possible frontier returns can be expressed as Rmv = a + bmt+1 , but that
does not mean that every combination of a and b in that expression produce a valid payoff.
For instance 2Rf + 0mt+1 is simply twice the risk-free rate with zero risk, which is not a valid
return (the only compensation for zero risk that does not break our model is the risk-free
rate itself).
For this reason I think the book is incorrect to say that perfect correlation implies that
we can span all frontier returns based on the difference between two frontier returns. The
random values (1 + mt+1 ), (2 + mt+1 ), (1 + 2mt+1 ) are all perfectly correlated but the
differences between any two values does not span the third. The combination of Eqn 68 and
perfect correlation are both needed to justify the following equation from the book (which
has one free variable a):
Rmv = Rf + a(Rm − Rf ) (88)
To prove that Eqn 88 must be true we can say that we’ve already shown that all returns that
are perfectly correlated to the stochastic discount factor can be expressed as Rmv = a + bRm ,
meaning that a new frontier return Rmv must be linearly related to an existing frontier return
Rm . So if we plug that into Eqn 68 and assume that the correlation coefficient is 1 (instead
of -1) we have:
σt [mt+1 ]
Et [Rmv ] = Rf − ρmt+1 ,Rmv σt [Rm ] (89)
Et [mt+1 ]
σt [mt+1 ]
Et [aRm + b] = Rf − σt [aRm + b] (90)
Et [mt+1 ]
σt [mt+1 ]
aEt [Rm ] + b = Rf − aσt [Rm ] (91)
Et [mt+1 ]
Rf − b σt [mt+1 ]
Et [Rm ] = − σt [Rm ] (92)
a Et [mt+1 ]
Since we assumed Rm is a payoff on the frontier and with the assumption that the correlation
coefficient is again 1 we can also write:
σt [mt+1 ]
Et [Rm ] = Rf − σt [Rm ] (93)
Et [mt+1 ]
With these two equations taken together we get the following. If we change our assumptions
such that one of Rmv or Rm has a correlation coefficient equal to -1 the sign of a may flip
but the result is still the same.
Rf − b
Rf = (94)
a
b = R − aRf
f
(95)
Rmv = b + aRm (96)
Rmv = Rf − aRf + aRm (97)
mv f m f
R = R + a(R − R ) (98)

15
The fact that any mean-variance return carries all pricing information is saying that if
we have the risk-free rate Rf and all of the future payoff information for another return on
the frontier Rmv then we can derive the stochastic discount factor mt+1 . Note that while any
single return Rmv can be expressed as Rmv = c + dmt+1 different returns may use different
coefficients for both c and d. Problem 3 at the end of this chapter gives a hint, saying that
it is easiest to parameterize mt+1 using the demeaned value of Rmv . Note that Eqn 105 is
simply a simplified version of Eqn 46 that helps us resolve the value of b.

mt+1 = a + bRmv (99)


mt+1 = c + b(Rmv − Et [Rmv ]) (100)
Et [mt+1 ] = c + Et [b(Rmv − Et [Rmv ])] (101)
Et [mt+1 ] = c (102)
mt+1 = Et [mt+1 ] + b(Rmv − Et [Rmv ]) (103)
1
mt+1 = f + bRmv − bEt [Rmv ] (104)
R
cov[Rmv , mt+1 ]
Et [Rmv ] = Rf − (105)
Et [mt+1 ]
Et [Rmv ] = Rf − (Rf b)var[Rmv ] (106)
f mv
R − Et [R ]
b= (107)
Rf var[Rmv ]
Rf − Et [Rmv ]
b= f (108)
R Et [(Rmv − Et [Rmv ])2 ]
a = Rf + bEt [Rmv ] (109)

The single-beta representation uses the following derivation (with a different constant a):

Ri = Rf + a(Rmv − Rf ) (110)
i f mv f
R = R + aR − aR (111)
i mv mv mv
cov[R , R ] = a(cov[R , R ]) (112)
cov[Ri , Rmv ] var[Rmv ]
= a (113)
var[Rmv ] var[Rmv ]
βi,mv = a (114)
i f mv f
R = R + βi,mv (R −R ) (115)
i f mv f

Et [R ] = R + βi,mv Et [R ]−R (116)

Note that once again λ = Et [Rmv ] − Rf is a constant (requiring one frontier return Rmv )
independent of which asset Ri we are pricing.
The comment about graphing mean vs. betas is saying that if you change the graph to use
βi,mv along the horizontal axis instead of σ[Ri ] all assets will fall along a single line. This line
will be the upper edge of the mean-variance frontier and it will continue into negative values
of β (note that β can be negative whereas standard deviation σ cannot). If we look at the
original graph in the book (using σ[Ri ] as the horizontal axis) all payoffs with zero beta will

16
exist along the E[Ri ] = Rf horizontal line. This gives us the intuition that grouping assets
by beta values instead of sigma (standard deviation) values we are essentially throwing out
information about idosyncratic risk. When the graph uses the standard deviation of return
along the horizonatal axis (as shown in Figure 1.1 in the book) different payoffs with different
levels of idiosyncratic risk have different points on the graph and the collection of all points
fills a wedge shaped region. If we made a graph using beta along the horizontal axis all assets
that have the same expected return collapse into a single point regardless of idiosyncratic
risk and the set of all assets forms a line.
The comment about not wanting to put you whole portfolio in an inefficient asset (an
asset that does not lie on the mean-variance frontier) is confusing to me. In my mind we
have gone through a decent amount of trouble to point out that idiosyncratic risk has zero
compensation which means the consumer does not care about it at all. I think I would
emotionally prefer to have an efficient asset versus having additional idiosyncratic risk that
behaves like a combination of insurance and more risk, but our model says that a purely
rational consumer should not care.

1.4.6 Slope of the Mean-Standard Deviation Frontier and Equity Premium Puz-
zle
We can take the Sharpe ratio bound from Eqn 67 and look at it with a power utility function
as shown in Eqn 117:
Et [Rmv − Rf ] −γ

≤ σt [(ct+1 /ct ) ] (117)
σt [Rmv ] Et [(ct+1 /ct )−γ ]

The slope of the mean-standard deviation frontier lines is simply the equality relationship
as shown in Eqn 118:
Et [Rmv − Rf ] −γ

= σt [(ct+1 /ct ) ] (118)
σt [Rmv ] Et [(ct+1 /ct )−γ ]

The book says that “the standard deviation on the right hand side is large if consumption
is volatile or γ is large”. However, that is not strictly true. If consumption is volatile but
future consumption is strictly greater than current consumption, then as γ becomes larger
and larger the standard deviation will decrease and event trend towards 0. As an example if
future consumption is equally likely to be 110%, 120% or 130% of current consumption then
the standard deviation of (ct+1 /ct )−γ increases as γ increases from 1 to around 6, but then
after that the standard deviation steadily decreases. At very large γ values, say γ = 100,
we are taking the standard deviation of 1/(1.1100 ), 1/(1.2100 ), and 1/(1.3100 ) (which are
7.25657159e-05, 1.20746735e-08, and 4.03333940e-12 respectively). Very large gamma values
tend to discount differences in positive consumption change ((ct+1 /ct ) > 1) and greatly
amplify the contribution of possible negative consumption changes.
To derive equation 1.20 in the book we can use the hints given in the problems at the end
2
of the chapter: σ 2 (x) = E[x2 ] − E[x]2 , for normal variable z we have E[ez ] = eE[z]+(1/2)σ [z]
(as proven in Eqn 27) and also remember that for constant a we have σ 2 (aX) = a2 σ 2 (X).

17
Using ln(ct+1 /ct ) as ∆ ln c we have:
Et [Rmv − Rf ] −γ

= σt [(ct+1 /ct ) ] (119)
σt [Rmv ] Et [(ct+1 /ct )−γ ]
p
Et [(ct+1 /ct )−2γ ] − Et [(ct+1 /ct )−γ ]2
= (120)
Et [(ct+1 /ct )−γ ]
p
Et [e(∆ ln c)(−2γ) ] − Et [e(∆ ln c)(−γ) ]2
= (121)
Et [e(∆ ln c)(−γ) ]
p
eE[−2γ∆ ln c]+(1/2)σ2 [−2γ∆ ln c] − (eE[−γ∆ ln c]+(1/2)σ2 [−γ∆ ln c] )2
= (122)
eE[−γ∆ ln c]+(1/2)σ2 [−γ∆ ln c]


eE[−2γ∆ ln c] e2γ 2 σ2 [∆ ln c] − eE[−2γ∆ ln c] eγ 2 σ2 [∆ ln c]
= (123)
p eE[−γ∆ ln c] e(γ 2 /2)σ2 [∆ ln c]
e−2γE[∆ ln c] eγ 2 σ2 [∆ ln c] (eγ 2 σ2 [∆ ln c] − 1)
= (124)
e−γE[∆ ln c] e(γ 2 /2)σ√
2 [∆ ln c]

2 2
e−γE[∆ ln c] e(γ /2)σ [∆ ln c] eγ 2 σ2 (∆ ln c) − 1
= (125)
p e−γE(∆ ln c) e(γ 2 /2)σ2 (∆ ln c)
= eγ 2 σ2 (∆ ln c) − 1 (126)

The approximation of ex = 1 + x comes from taking the Taylor expansion of ex around 0 and
keeping the constant and linear terms. If we consider γ 2 σ 2 (∆ ln c) to be a random variable
near 0 we can say:
2 σ 2 (∆ ln c)
eγ ≈ 1 + γ 2 σ 2 (∆ ln c) (127)
p
eγ 2 σ2 (∆ ln c) − 1 ≈ γσ(∆ ln c) (128)

The equity premium puzzle is saying that the slope of the mean-standard deviation
frontier (equivalent to the maximum Sharpe ratio) is 0.5. Based on the approximation in
Eqn 128 that means 0.5 = γσ(ln c). Now we are also told that consumption growth has
a standard deviation of 1%. Technically our equation uses log of year over year growth
(∆ ln c = ln(ct+1 /ct )) but if we approximate standard deviation of log growth with standard
deviation of growth we get 0.5 = γ(0.01), which in turn gives us a risk-aversion coefficint of
γ = 50. Note that the book provides the mean of consumption growth but I do not believe
it is used for this calculation.
An asset that lies on the efficient frontier would have a payoff perfectly correlated with
the stochastic discount factor. However, the books says that aggregate consumption has a
0.2 correlation with market return. Note that Eqn 65 originally used correlation between a
return and the stochastic discount factor, but here we are looking at the correlation of market
return with aggregate consumption. I would think that percentage change in consumption
would most closely match the stochastic discount factor mt+1 = βu0 (ct+1 )/u0 (ct ), but tracking
the correlation of market return to consumption itself was approximation provided in the
book. Since the US stock market only has a correlation of 0.2 to consumption this is saying
that the US stock market is not on the efficient frontier, and in fact the Sharpe ratio for

18
something on the efficient frontier would be 5 times larger. This in turn makes γ = 250.
Given that the US stock market isn’t on the efficient frontier we could imagine an asset on
the efficient frontier that had the same variance as the stock but was perfectly correlated with
the stochastic discount factor. This asset would have higher average returns but because of
the correlation it would always payoff in good times and always lose in bad times making it
very risky. Note that I generally do not like these switching from discussing micro economics
to macro economics without any warning or disclaimer (see Appendix B for more details).

1.4.7 Random Walks and Time-Varying Expected Returns


The book says that trading models that “reliably survive transactions costs and do not
implicitly expose the investor to risk have not yet been reliably demonstrated”. However, I
would argue that various statistical arbitrage and high frequency trading firms have shown
that at least over the time period of a decade people have found proprietary models that have
been predictive and profitable in the stock market. If the entire market knows the direction
a stock will move then the stock price will be corrected very quickly, but if only a small
section of the market has figured out a pattern that has predictive power the inefficiency can
be successfully exploited for long period of time (such as firms that profited from statistical
arbitrage in the 1990s).

1.4.8 Present-Value Statement


Here is the full derivation for maximizing utility across a multi-period objective by purchasing
ξ unts of an asset that delivers a stream of {dt+j } dividends:
"∞ #
X
j
max u(ct (ξ)) + Et β u(ct+j (ξ)) (129)
ξ
j=1

ct (ξ) = et − pt ξ (130)
ct+j (ξ) = et+j + dt+j ξ j≥1 (131)

Setting the derivative of what we want to optimize to zero to find critical points:

19

∂ hX i
0= u(ct (ξ)) + Et β j u(ct+j (ξ)) (132)
∂ξ j=1

∂ hX i
0= u(et − pt ξ) + Et β j u(et+j + dt+j ξ) (133)
∂ξ j=1

hX i
0
0 = u (et − pt ξ)(−pt ) + Et β j u0 (et+j + dt+j ξ)(dt+j ) (134)
j=1

hX i
0
pt u (et − pt ξ) = Et β j u0 (et+j + dt+j ξ)dt+j (135)
j=1
∞ 
hX β j u0 (et+j + dt+j ξ)  i
pt = Et dt+j (136)
j=1
u0 (et − pt ξ)
∞ 
hX β j u0 (ct+j )  i
pt = Et dt+j (137)
j=1
u0 (ct )

hX i
pt = Et mt+j dt+j (138)
j=1

The book notes that if you chain together the two period version of the equations you
must assume the transversality condition that limj→∞ Et [mt+j , pt+j ] = 0 (note that I believe
the book has a typo here putting two subscripts below m). The transversality condition
makes sense because when we measured infinite utility in Eqn 129 we did not include a
non-zero value for the asset at an infinite horizon.

1.5 Discount Factors in Continuous Time


If diffusion process, stochastic process, martingale process, Brownian Motion or Ito’s Lemma
do not feel like well practiced concepts I would recommend reading the Continuous Time
appendix in the book and also watching the class video lectures on stochastic calculus.
Note that when we express standard Brownian motion zt+∆ − zt = N (0, ∆) that the
change over time interval ∆ is a normal distribution with mean√zero and variance ∆. Im-
portantly the standard deviation for the normal distribution is ∆.
We have switched from having β as the subjective discount factor to a continuous variable
−δt
e controlled by the subjective discount rate δ (defined in the “Risk-Free Rate” part of
Section 1.4 in the book). Similarly to how Dt is the rate of dividends at time t consumption
ct is now expressed as a rate of consumption. The amount we actually consume in some
small interval dt is ct dt.
A footnote in the book discusses reducing consumption by a small time period dt to
finance the purchase of ξ units of the security. We will do a similar derivation but with
a small time interval named ∆. Note that planned consumption at time t is expressed as
the variable e with a subscript et , whereas the exponential function used for the subjective
discount rate is expressed as the constant e and a superscript e−δs . Also note that for sizable

20
values of ∆ these equations are invalid because we are assuming that we can purchase units
of the asset at fixed price pt through the time period t through t + ∆. The continuous
time equations that use these results will take the limit as ∆ → 0 so for our purposes this
assumption is fine.
Z ∞ 
−δs
max Et e u(ct+s (ξ))dt (139)
ξ s=0
ct+s (ξ) = et+s − (pt /∆)ξ s≤∆ (140)
ct+s (ξ) = et+s + Dt+s ξ s>∆ (141)

Note that similar to the discrete utility optimization we should think briefly about making
sure our consumption never goes negative (which breaks various utility models). Utility
functions should either have a limit of −∞ when approaching zero from above (which the
commonly used power utility model does assuming γ ≥ 1), or we need to run a constrained
optimization to make sure ct+s is always non-negative.
Assuming that consumption stream e and dividend stream D are differentiable every-
where then critical points will happen when the derivative is 0:
∂  h ∞ −δs
Z i
0= Et e u(ct+s (ξ))ds (142)
∂ξ s=0
∂  h ∆ −δs
Z i
0= Et e u(et+s − (pt /∆)ξ)ds +
∂ξ s=0
hZ ∞ i
−δs
Et e u(et+s + Dt+s ξ)ds (143)
s=∆
hZ ∆ i
0 = Et e−δs u0 (et+s − (pt /∆)ξ)(−pt /∆)ds +
s=0
hZ ∞ i
−δs 0
Et e u (et+s + Dt+s ξ)(Dt+s )ds (144)
s=∆

Below we use the shorthand Λt ≡ e−δt u0 (ct ). Because planned consumption e and dividend
streams D are differentiable around 0 we can say as ∆ → 0:
Z ∆ hZ ∞ i
0 0
e u (et − (pt /∆)ξ)(pt /∆) ≈ Et e−δs u0 (et+s + Dt+s ξ)(Dt+s )ds (145)
s=0 s=∆
hZ ∞ i
0
∆u (et − (pt /∆)ξ)(pt /∆) ≈ Et e−δs u0 (et+s + Dt+s ξ)(Dt+s )ds (146)
s=∆
hZ ∞ i
0
u (ct )pt = Et e−δs u0 (ct+s )(Dt+s )ds (147)
s=0
hZ ∞ i
p t Λt = E t Λt+s Dt+s ds (148)
s=0

Later it is useful to know what happens if we took Eqn 143 but instead decided to start
purchasing shares at time t + a at price pt+a for small positive constant a and complete our
purchasing of shares at time t + a + ∆. As above because we are going to collapse ∆ and a

21
to 0 in the following continuous time equations so we feel justified in using a constant price
pt+a which we will approximate at time t.

∂  h a −δs 0
Z i
0= Et e u (et+s )ds +
∂ξ s=0
h Z a+∆ i
Et e−δs u(et+s − (pt+a /∆)ξ)ds +
s=a
hZ ∞ i
Et e−δs u(et+s + Dt+s ξ)ds (149)
s=a+∆
h Z a+∆ i
0 = 0 + Et e−δs u0 (et+s − (pt+a /∆)ξ)(−pt+a /∆)ds +
s=a
hZ ∞ i
−δs 0
Et e u (et+s + Dt+s ξ)(Dt+s )ds (150)
s=a+∆

Continuing as we did before but keeping the expectation operator on both sides since
t + a is in the future:
h Z a+∆ i hZ ∞ i
−δa 0 −δs 0
Et e u (et+a − (pt+a /∆)ξ)(pt+a /∆) ≈ Et e u (et+s + Dt+s ξ)(Dt+s )ds
s=a s=a+∆
(151)
h i hZ ∞ i
−δa 0 −δs 0
Et ∆e u (et+a − (pt+a /∆)ξ)(pt+a /∆) ≈ Et e u (et+s + Dt+s ξ)(Dt+s )ds
s=a+∆
(152)
hZ ∞ i
Et [e−δa u0 (ct+a )pt+a ] = Et e−δs u0 (ct+s )(Dt+s )ds (153)
s=a
hZ ∞ i
Et [pt+a Λt+a ] = Et Λt+s Dt+s ds (154)
s=a

The book says that the time interval u0 (ct+∆ )/u0 (ct ) may not be well defined and I believe
that is because there could be an instantaneous jump in consumption from the instantaneous
value ct and nearby values of ct+∆ .
If we scale Λ by the risk-free rate Rf the book says this will look like a risk-free formu-
lation. I’m not sure if this is exactly what was intended but my derivation is as follows:
f
Λt = e−δt ert u0 (ct ) (155)
hZ ∞ i
0
pt u (ct ) = Et e−δs u0 (ct+s )Dt+s ds (156)
s=0
f f
hZ ∞ f f
i
pt ert u0 (ct )e−rt = Et (e−δs ert+s u0 (ct+s ))Dt+s e−rt+s ds (157)
s=0
f f
hZ ∞ f f
i
−δt rt 0 −rt −δt −δs rt+s 0 −rt+s
pt e e u (ct )e = Et (e e e u (ct+s ))Dt+s e ds (158)
s=0
hZ ∞ i
−rtf f
−rt+s
pt Λt e = Et Λt+s Dt+s e ds (159)
s=0

22
As the book suggests we can use Eqn 147 and take the difference between buying a
security at time t and selling it at time t + ∆.
hZ ∞ i
p t Λt = E t Λt+s Dt+s ds (160)
s=0
hZ ∆ i hZ ∞ i
p t Λt = E t Λt+s Dt+s ds + Et Λt+s Dt+s ds (161)
s=0 s=∆
hZ ∆ i
pt Λt = E t Λt+s Dt+s ds + Et [pt+∆ Λt+∆ ] (162)
s=0

Eqn 162 uses the identity shown in Eqn 154.


As stated in the book for small ∆ the integral can be approximated:

pt Λt ≈ Λt Dt ∆ + Et [pt+∆ Λt+∆ ] (163)


pt Λt ≈ Λt Dt ∆ + Et [pt Λt + (pt+∆ Λt+∆ − pt Λt )] (164)
0 ≈ Λt Dt ∆ + Et [(pt+∆ Λt+∆ − pt Λt )] (165)

In the limit as ∆ → 0:

0 = Λt Dt dt + Et [d(pt Λt )] (166)

I like the following explanation of Eqn 166 given in the book. Assuming there are
no dividends and the stochastic discount factor Λt stays at its current (scalar) value then
Et [d(Λt pt )] = Et [dpt ] = 0 means that price should follow a martingale (a jumpy stochastic
process with mean 0). Once we let Λt take its full range of expected values at time dt in the
future then Et [d(Λt pt )] = 0 means that marginal utility weighted price should follow a mar-
tingale. What “marginal utility weighted” means is that we can think of reweighting futures
states of the world in terms of how much they matter to us. Because of the concave utility
function states of the world that correspond to low personal consumption will have higher
weight than states with high consumption (see risk neutral probabilities in Chapter 3). So
the price, weighted by how much it matters to us, should follow a martingale with mean
zero. Finally the (Λt Dt dt) term adjusts for the rate at which you are receiving dividends.
If the utility weighted price didn’t have an expected mean of zero that would mean that
we should have bought more or less of the security and you are not in at your optimal
equilibrium. The optimal equilibrium happens when according to marginal utility you don’t
really want to have any more or less of the security, which corresponds to the expected mean
of 0.
As an example of using Ito’s Lemma and stochastic calculus lets look at how to derive
the product of two stochastic variables so that we can next derive d(Λt pt ). At a high level
Ito’s Lemma says that to find the differential of a stochastic process take the second order
Taylor expansion, and then throw out (dt)2 and dzt dt terms, but keep (dz)2 terms because
(dzt )2 = dt (the variance of the diffusion is equal to the time elapsed). For this problem
we won’t even dig into the mean dt and diffusion dzt terms, we simply do the second order

23
expansion:
∂(Xt Yt ) ∂(Xt Yt )
d(Xt Yt ) = dXt + dYt +
∂Xt ∂Yt
1 ∂ 2 (Xt Yt ) 1 ∂ 2 (Xt Yt ) 2 1 ∂ 2 (Xt Yt ) 2
2 dXt dYt + dX t + dYt (167)
2 ∂Xt ∂Yt 2 (∂Xt )2 2 (∂Yt )2
1
d(Xt Yt ) = (Yt )dXt + (Xt )dYt + 2 (1)dXt dYt + (0)dXt2 + (0)dYt2 (168)
2
The second derivative with respect to (∂Xt )2 and (∂Yt )2 are both zero (that does not require
Ito’s Lemma at all, that just basic derivatives of the product Xt Yt ), and the partial derivative
with respect to (∂Xt ∂Yt ) is 1. Cleaning this up we are left with:

d(Xt Yt ) = Yt dXt + Xt dYt + dXt dYt (169)

This is used to derive the following:

d(Λp) = p dΛ + Λ dp + dp dΛ (170)

Going back to Eqn 166 we now have:

0 = Λt Dt dt + Et [d(pt Λt )] (171)
0 = Λt Dt dt + Et [pt dΛt + Λt dpt + dpt dΛt ] (172)
Λt Dt dt Et [pt dΛt + Λt dpt + dpt dΛt ]
0= + (173)
Λt pt Λt pt
 
Dt dΛt dpt dpt dΛt
0= dt + Et + + (174)
pt Λt pt pt Λt

Plugging in the risk-free rate defined as the asset where dpt /pt = rtf dt, Dt = 0 we are
most of the way there. The last trick is that according to Ito’s Lemma (dΛt dpt ) is only
non-zero if the two diffusion terms overlap to create a (dzt )2 term. Since the risk-free rate
does not have a diffusion component (dΛt dpt ) = 0.
 
0 dΛt f
0 = dt + Et + rt dt + 0 (175)
pt Λt
 
f dΛt
rt dt = −Et (176)
Λt
Relating this to the discrete case you can see that Eqn 176 and Eqn 179 are fairly similar.
1
Rtf = (177)
Et [mt+1 ]
f u0 (ct )
1 + rt = (178)
βEt [u0 (ct+1 )]
 0
βu (ct+1 ) − u0 (ct )

f
rt = −Et (179)
βu0 (ct+1 )

24
Similarly the book relates the following continuous time equation:
 
Dt f dpt dΛt dpt
0= dt + Et −rt dt + + (180)
pt pt Λt pt
   
dpt Dt dΛt dpt
Et + dt = rtf dt − Et (181)
pt pt Λt p t
The book makes a comment that the last term of Eqn 181 is the same as the covariance.
Computing the covariance is simply the expectation of the product of two random variables
with zero mean. Ito’s Lemma says that if we are looking at the product of two differentials
(dΛt )(dpt ) all second order terms will fall away except for the (dzt )2 term. So in this case
we are looking at the expectation when two random diffusion variables, each with mean 0,
is multiplied. That is the exact definition of the covariance computation. The book also
makes a comment about the last term being equal to the “second moment” but moments are
usually defined over a single variable or function not two, so I’m not sure what definition of
second moment is being used here (the second moment of the product of the two variables
does not seem to work).
Now that we see that Eqn 181 has a covariance term it looks very similar to the discrete
equation in Eqn 184.
Et [Rti ] = Rtf − Rtf covt [mt+1 , Ri ] (182)
Et [Rti − 1] = (Rtf − 1) − Rtf covt [mt+1 , Ri ] (183)
Et [rti ] = rtf − Rtf covt [mt+1 , Ri ] (184)

I believe the Rtf term in Eqn 184 does not show up in Eqn 181 because the subjective
discount factor in continuous time e−δt converges to 1 as dt gets tiny. The book says that
this interest rate component “naturally vanishes as the time interval gets short”.
Jumping back to Ito’s Lemma, with Λt = e−δt u0 (ct ) we consider this to be a product of
two functions, e−δt which is only a function of t (a drift term) and u0 (ct ) which is a function
of ct which we are assuming is a diffusion process that may have both a drift dt and diffusion
dzt term. Lets call the first term function a = e−δt and the second term function b = u0 (ct ).

∂(ab) ∂(ab)
d(ab) = da + db+
∂a ∂b
1 ∂ 2 (ab) 1 ∂ 2 (ab) 2 1 ∂ 2 (ab) 2
2 dadb + da + db (185)
2 ∂a∂b 2 (∂a)2 2 (∂b)2
Because we only care about the second order Taylor expansion terms that contain a (dz)2
term, we can ignore second order terms that contain (da) since that function has no diffusion
component so we won’t be able to get two dz terms together.
∂(ab) ∂(ab) 1 ∂ 2 (ab) 2
d(ab) = da + db + db (186)
∂a ∂b 2 (∂b)2
1
d(Λt ) = (−δ)e−δt u0 (ct )dt + e−δt u00 (ct )dct + e−δt u000 (ct )(dct )2 (187)
2
(188)

25
For (dΛt /Λt ) we have:
d(Λt ) (−δ)e−δt u0 (ct )dt e−δt u00 (ct )dct 1 e−δt u000 (ct )(dct )2
= + + (189)
Λt e−δt u0 (ct ) e−δt u0 (ct ) 2 e−δt u0 (ct )
d(Λt ) ct u00 (ct ) dct 1 c2t u000 (ct ) (dct )2
= −δdt + 0 + (190)
Λt u (ct ) ct 2 u0 (ct ) c2t
Given the following definitions:
ct u00 (ct )
γt = − (191)
u0 (ct )
c2 u000 (ct )
ηt = t 0 (192)
u (ct )
We can plug this into Eqn 176:
dΛt dct 1 (dct )2
= −δdt − γt + ηt 2 (193)
Λt ct 2 ct
 
1 dΛt
rtf = − Et (194)
dt Λt
(dct )2
   
f 1 dct 1 1
rt = δ + γt Et − ηt Et (195)
dt ct 2 dt c2t
We can also plug γt into Eqn 181:
   
dpt Dt f dpt dΛt
Et + dt = rt dt − Et (196)
pt pt p t Λt
dct 1 (dct )2
    
dpt Dt f dpt
Et + dt − rt dt = −Et −δdt − γt + ηt 2 (197)
pt pt pt ct 2 ct
Based on Ito’s Lemma when looking at the right side of Eqn 197 we want to keep all first
order terms and any second order terms that might have a (dz)2 term. The dpt dt term has
no diffusion (dz) component in −δdt so that term is out. Also the (dpt (dct )2 ) term is out
as well because it is a third order term. The only term we need to keep is the dpt dct term
which is the combination of two diffusion processes and therefore might have a (dz)2 term.
   
dpt Dt f dct dpt
Et + dt − rt dt = γt Et (198)
pt pt ct p t
We have a few identities given right at the end of the chapter:
µp = Et [dpt /pt ] (199)
σp2 = Et [(dpt /pt )2 ] (200)
σc2 = Et [(dct /ct )2 ] (201)
   
dpt Dt f dct dpt
Et + dt − rt dt = γt Et (202)
pt pt c p
 t t 
Dt dct dpt
µp + dt − rtf dt = γt Et (203)
pt ct p t

26
Because Et [(dct /ct )(dpt /pt )] is measuring covariance we can relate it to the correlation and
standard deviation of the variables, and furthermore bound values based on the correlation
always being between -1 and 1.
 
dct dpt
Et = ρ[ct , pt ]σp σc (204)
ct p t
 
1 dct dpt
σp σc Et ct pt ≤ 1 (205)

µp + Dptt dt − rtf dt
 
1 dct dpt
= Et (206)
σc σp γt σc σp ct p t
µp + Dt dt − rf dt

pt t
≤1 (207)

σc σp γt


µp + Dt dt − rf dt

pt t
≤ γt σc (208)

σp

The left term of Eqn 208 has excess return over the standard deviation of returns which
is the Sharpe ratio. We see that the Sharpe ratio is bounded in the continuous time version
of the equations as it was in the discrete time version. As before the minimum and maximum
Sharpe ratio determine the slope of the efficient frontier. Note that in the book the continuous
time bounds for the sharpe ratio may have a typo, since in the book the sharpe ratio is only
bound from above, whereas Eqn 208 gives both an upper and lower bound for the Sharpe
ratio.

References
[1] J. Cochrane. Asset Pricing (Revised Edition). Princeton University Press, 2005.

Appendix A Inferences About Equilibrium Equations


There are a number of cases where the book looks at an equilibrium equation and then
starts making inferences about variables in those equations. Most of the inferences done in
this chapter seem to have one or more of the below issues which makes the logic behind
the inference unconvincing to me. There may well be references that provide many more
derivations to asuage all of these issues, in which case I still think the book should discuss
the issue and provide links to those references. It is also possible that I am horribly confused
about the whole thing.

• Many of the variables used in equilibrium equations are really functions of other vari-
ables and are not independent of each other. The main equation that everything else
is derived on is an optimization over a single free variable, ξ, the number of asset units
purchased. If we try to make inferences about other variables that depend on ξ (such
as ct , ct+1 , mt+1 ) we are ignoring the possibly complicated interactions between the

27
variables and the underlying optimization process that defines the equilibrium equa-
tions. Making inferences about how an increase in future consumption ct+1 may affect
something else needs to take into account that if ct+1 changes that probably means
that ξ changes which means that ct changes as well. These interactions are usually
ignored, and the resulting conclusions do not feel mathematically sound.
• We assumed pt was a constant for the initial derivation, but then later the book makes
inferences about other variables changing and how it affects the current price pt . In
Eqn 6 we relied on pt being a constant when taking the derivative with respect to ξ:
(d/dξ)(−pt ξ) = −pt . However, if pt is actually a variable that depends on ξ, then the
derivative could be wrong and all of the further equations could be wrong as well. If
you assume a value is constant in your most early derivation it feels like more discussion
is needed before you can say this value changes as a function of another. In section 2.2
the book claims that you can switch between saying prices are based on consumption
or consumption is based on prices. This may be true, but to mathematically show it
is true we should derive equations that use a new version of our initial optimization
(Eqn 6) where pt is a variable instead of a constant and consumption is fixed (this means
we need to be careful when computing (d/d)ξ)(−pt ξ). So far the only derivation the
book has provided has gone from a fixed price to a consumer’s consumption plan. Any
inferences outside of this track are not convincing to me until I see that alternative
derivations work.
• In Eqn 8 we see that pt appears on both the left and right hand side of the equation
whereas in Eqn 9 and others pt does not directly appear on the right hand side of the
equation because it is contained in another variable like ct or mt+1 . As the chapter
progresses it is common that to all visual appearances pt only exists on the left side
of the equation (but this is incorrect). The book will then make inferences about
equilibrium equations and how pt will change on the left side based on changes that
are made to the right side. But again, this is ignoring interactions in the equation
because when you expand out the meaning of all of the variables pt is present on both
the left and right side of the equation. Any inference that discusses the behavior of pt
only looking at one of the two instances of pt is suspect (in addition to pt being defined
as a constant as noted above).

Lets take a simple example. In Eqn 9 we see pt appears on the left side and the right side
is scaled by the β factor. So one might think that if β is scaled by half then pt will go down
by exactly half. However, pt is actually hidden inside of ct so it appears on both sides of
the equation. So if we change pt , the original maximization of ξ in Eqn 6 changes, so in all
likelihood the optimal value of ξ will change, which means that ct and ct+1 changes, which
means that cutting β in half has ripple effects on our optimization variable ξ and the value
of pt will probably not simply be cut exactly in half as we first supposed (without a deeper
analysis it is hard to say if it is reduced the same, more, less, or perhaps even increased!).
This is of course setting aside that asking how pt changes as a function of another variable
would seem to contradict our original assumption that pt was constant.
An example from the book is the claim that “Marginal utility u0 (ct ) declines as c rises.
Thus, an asset’s price is lowered if its payoff covaries with consumption” in the Risk Correc-

28
tions part of section 1.4 in the book, referring to equation 1.10 in the book. Intuitively this
makes sense to me, but I don’t think the book has proved the statement mathematically.
First, we are talking about ct rising, but if ct rises due to a change in our purchase amount ξ,
then ct+1 will decrease, and this complication is never addressed. Second, we assumed that pt
was constant when doing the initial derivation, but now we are making inferences with a new
assumption that pt is a function of other variables. Lastly, our logic is completely ignoring
the pt value hidden inside of ct and only focusing on the pt value on the left hand side of the
equation. This is another dependence between variables that we are not accounting for, and
which will most likely cause the optimal value of ξ to change which will in turn cause ct+1
to change.
Again, the inferences may well be correct, but I currently do not find the arguments
supporting these inferences to be mathematically sound.

Appendix B From Micro To Macro


There are a number of places where the book switches from a micro economic description
(where each consumer has a different stochastic discount factor and utility function) to a
macro economic description (where we have a single set of variables that represents the
collective will of the economy). In the “Week 5 Asset Pricing Theory” notes Prof. Cochrane
states there is a “law of aggregation” that lets you apply rules in micro economics to macro
economics given certain assumptions. I wish that this law was more clearly spelled out or
that references were given. I also wish that the examples that jump into macro economics
were more clearly marked.
In my mind it interesting and worth noting that something like idiosyncratic risk and the
efficient frontier can vary from consumer to consumer based on their individual stochastic
discount factor, but that is not emphasized in the book. It may well be that in many
markets consumers have very similar discount factors so we can examine properties of an
asset independent of the consumer that is evaluating it, but how often this is true and what
approximations need to be made is never clearly spelled out in the book.

29

Você também pode gostar