Você está na página 1de 16

x y z Total Least-squares ( Orthogonal ) Regression

Ordinary Least Squares ( Nor


0.0118 0.0549 0.3769 Slope 3.2867
0.0648 0.4990 0.5511 Intercept 0.3381
0.1365 1.2477 0.7867
0.1509 1.3439 0.8341 4.00
0.1730 0.9849 0.9067
0.1934 1.5304 0.9737 3.50
0.1991 0.9769 0.9925
0.2523 1.2821 1.1673 3.00
0.2714 1.6606 1.2301
2.50
0.2844 1.5627 1.2728
0.2897 1.7861 1.2903 2.00
0.2987 1.7280 1.3198
0.3028 1.5151 1.3333 1.50
0.3093 1.2807 1.3547
1.00
0.3412 1.4339 1.4595
0.3420 1.9614 1.4621 0.50
0.3704 1.2501 1.5555
0.3784 1.5916 1.5818 0.00
0.4449 1.9384 1.8003 0.00 0.20 0.40 0.60 0.80 1.00
0.4692 1.8366 1.8802
0.4966 2.1051 1.9703 I?m looking to trade a pair of equities, and attempting to kee
0.5226 2.0129 2.0557
0.5341 2.4959 2.0935 ABC, Last Price = 50
0.5417 2.1110 2.1185 XYZ, Last Price = 10
0.5466 1.8384 2.1346
0.5681 1.7141 2.2053 For every 1 share of ABC sold, I would buy (ABC/XYZ) = 5 sh
0.5936 1.5000 2.2891
That works well for two stocks that move around at a similar p
0.6213 2.0627 2.3801 Let?s say that for every 1% SPY moves, AAPL moves 2%. In
0.6449 2.6729 2.4577 this difference in movement.
0.6602 2.3864 2.5080
0.6614 2.4871 2.5119 AAPL, Last Price = 260
0.6822 2.2778 2.5803 SPY, Last Price = 113
0.6946 2.3559 2.6210
0.6979 2.8558 2.6319 For every 1 share of AAPL sold, I would buy (AAPL/SPY)*2 =
0.7027 2.3110 2.6476 SPY is 2.
0.7271 2.2392 2.7278
0.7373 2.8841 2.7614 Another way to look at that AAPL vs SPY pair is to say for eve
*.5=.2173 shares of AAPL. In this case SPY?s beta to AAPL
0.7948 2.3997 2.9504
0.8180 2.6302 3.0266 As in the example above, I?d expect to find that the calculatio
0.8216 3.3867 3.0384 of SPY?s beta to AAPL. Here?s where things stop making sen
0.8385 3.3287 3.0940 Beta of AAPL to SPY using daily 2010 returns results in 1.065
0.8537 3.3824 3.1439 to AAPL results in 0.475. I've tried manually calculating these
0.8600 2.5985 3.1646 answers.
0.8757 2.8299 3.2162
0.8801 3.5722 3.2307 Most examples I try are like this where the beta of ABC to XYZ
0.8939 3.3630 3.2761 to ABC. When the beta of ABC to XYZ does not equal the inve
0.8998 3.4912 3.2955 deciding what ratio to use. The resulting ratio should be the s
0.9568 3.6173 3.4828
something wrong or I'm approaching this pair neutrality from
for Beta.
0.9797 3.3579 3.5581
0.9883 3.1547 3.5863 What am I missing?

Use an orthogonal regression and the beta will be the same b

Yes, orthogonal regression is better. But if it helps much, you

Think of it this way. OLS regression tells you to hold the portfo
compute Beta for you to minimize the standard deviation of th
hold:
The standard deviation of return of asset Y times -Rho divide
The standard deviation of return of asset X divided by (1 - Rh

where the standard deviations are either estimated from the s


Use an orthogonal regression and the beta will be the same b

Yes, orthogonal regression is better. But if it helps much, you

Think of it this way. OLS regression tells you to hold the portfo
compute Beta for you to minimize the standard deviation of th
hold:

The standard deviation of return of asset Y times -Rho divide


The standard deviation of return of asset X divided by (1 - Rh

where the standard deviations are either estimated from the s


estimated to minimize the standard deviation of that portfolio

If these portfolios are very different then the correlation betwe


larger standard deviation than asset Y. In this case, X is not a
much but it will add a lot of volatility. If you held X and used O
hedge, it would give you a small number.

The same t and F stats apply to orthogonal regression as OLS

Can't help with the first part.

On the second part, dollar-neutral is for funds/people trying t


neutral is for funds/people trying to eliminate market risk. Cho

Volatility enters the calculation by attempting to form the mini


expression for the variance of your portfolio and then minimis
each asset. In the two asset case you will end up with someth
coefficient between the two assets and V1, V2 are the volatilit
beta. Note that this expression can also be derived directly fro

There are two separate questions regarding pair trading

1. Is orthogonal regression a superior way of identifying "corre


what are the statistical tests for testing the significance of par
(equivalent of t-stat and F-stat)?

2. Once a pair has been identified, how do we ratio the pairs:


neutrality? And according to Mcmillan volatility also enters the
to me why?

im trying to calculate the variance of the beta estimate.


Once estimated the orthogonal regression parameters, you ca
between the data points and the estimated regression line.htt
LineDistance2-Dimensional.html

I still end up with two different betas as with ordinary OLS... I


was to give you one beta only??
the two betas should be reciprocal, no?

R^2 is just ss^(xy)/ (ss(x) * ss(y)) which does not change on


stat for the slope coefficient (not the intercept) you will end up
interchanging x and y.

As Athletico wrote, this is orthogonal regression and is very cl


dimensional data, usually you need to process a k dimension
eigen-decomposition all O(k^3)). Check out the Karhunen-Lo
N dimensional matrix instead. Useful for N<<k. HTH.

I'm looking for a linear regression that will minimize the ("clos
("vertical distance to the line")^2 as with standard linear least
This is sometimes called orthogonal regression or total least s
http://en.wikipedia.org/wiki/Total_least_squares
0.0549 0.0118 0.0118
0.4990 0.0648 0.0648
1.2477 0.1365 0.1365
1.3439 0.1509 0.1509
0.9849 0.1730 0.1730
1.5304 0.1934 0.1934
0.9769 0.1991 0.1991
1.2821 0.2523 0.2523
1.6606 0.2714 0.2714
1.5627 0.2844 0.2844
1.7861 0.2897 0.2897
1.7280 0.2987 0.2987
1.5151 0.3028 0.3028
1.2807 0.3093 0.3093
1.4339 0.3412 0.3412
1.9614 0.3420 0.3420
1.2501 0.3704 0.3704
1.5916 0.3784 0.3784
1.9384 0.4449 0.4449
1.8366 0.4692 0.4692
2.1051 0.4966 0.4966
2.0129 0.5226 0.5226
2.4959 0.5341 0.5341
2.1110 0.5417 0.5417
1.8384 0.5466 0.5466
1.7141 0.5681 0.5681
1.5000 0.5936 0.5936
2.0627 0.6213 0.6213
2.6729 0.6449 0.6449
2.3864 0.6602 0.6602
2.4871 0.6614 0.6614
2.2778 0.6822 0.6822
2.3559 0.6946 0.6946
2.8558 0.6979 0.6979
2.3110 0.7027 0.7027
2.2392 0.7271 0.7271
2.8841 0.7373 0.7373
2.3997 0.7948 0.7948
2.6302 0.8180 0.8180
3.3867 0.8216 0.8216
3.3287 0.8385 0.8385
3.3824 0.8537 0.8537
2.5985 0.8600 0.8600
2.8299 0.8757 0.8757
3.5722 0.8801 0.8801
3.3630 0.8939 0.8939
3.4912 0.8998 0.8998
3.6173 0.9568 0.9568
3.3579 0.9797 0.9797
3.1547 0.9883 0.9883
Ordinary Least Squares ( Normal ) OLS x'y OLS y'x
2.8625 #VALUE! #VALUE! #VALUE! #VALUE! #VALUE!
0.5704 #VALUE! #VALUE! #VALUE! #VALUE! #VALUE!
#VALUE! #VALUE! #VALUE! #VALUE!
#VALUE! #VALUE! #VALUE! #VALUE!

TLS x'y TLS y'x


#VALUE! #VALUE! #VALUE! #VALUE! #VALUE!
#VALUE! #VALUE! #VALUE! #VALUE! #VALUE!
#VALUE! #VALUE! #VALUE! #VALUE!
#VALUE! #VALUE! #VALUE! #VALUE!

TLS or Orthogonal regression gives 'correct' reciprocal Beta

#VALUE!

0.60 0.80 1.00 1.20

ities, and attempting to keep them market neutral. A simple example is:

would buy (ABC/XYZ) = 5 shares of XYZ.


at move around at a similar pace, but let?s say I?m looking at AAPL vs SPY.
moves, AAPL moves 2%. In this case the ratio needs to take into account

would buy (AAPL/SPY)*2 = 4.6 shares of SPY. In this case AAPL?s beta to

vs SPY pair is to say for every share of SPY sold, I would buy (SPY/AAPL)
s case SPY?s beta to AAPL is .5
ect to find that the calculation of AAPL?s beta to SPY is simply the inverse
where things stop making sense to me. Using Bloomberg to calculate the
2010 returns results in 1.065. Doing the same to calculate the Beta of SPY
d manually calculating these betas in excel and came up with the same

here the beta of ABC to XYZ does not equal the inverse of the beta of XYZ
XYZ does not equal the inverse of XYZ to ABC, there is a problem when
sulting ratio should be the same either way. I think I'm either calculating
hing this pair neutrality from the wrong angle and have incorrect expectations

d the beta will be the same both ways.

er. But if it helps much, you should reconsider the trade.

on tells you to hold the portfolio $1 of asset Y and -Beta of asset X. It will
e the standard deviation of this portfolio. Orthogonal regression tells you to

of asset Y times -Rho divided by (1 - Rho^2)^0.5 of asset X


of asset X divided by (1 - Rho^2)^0.5 of asset Y

either estimated from the sample or derived independently. Rho is


d the beta will be the same both ways.

er. But if it helps much, you should reconsider the trade.

on tells you to hold the portfolio $1 of asset Y and -Beta of asset X. It will
e the standard deviation of this portfolio. Orthogonal regression tells you to

of asset Y times -Rho divided by (1 - Rho^2)^0.5 of asset X


of asset X divided by (1 - Rho^2)^0.5 of asset Y

either estimated from the sample or derived independently. Rho is


rd deviation of that portfolio.

nt then the correlation between the two assets is small and asset X has
set Y. In this case, X is not a good hedge for Y as it will not reduce risk
lity. If you held X and used OLS to to find out how much Y to use for a
number.

orthogonal regression as OLS.

al is for funds/people trying to eliminate the need to put up dollars. Delta-


to eliminate market risk. Choose delta-neutral ...

attempting to form the minimum variance portfolio. Write down an


ur portfolio and then minimise the variance with respect to the weights of
you will end up with something like p.V1/V2 where p is the correlation-
s and V1, V2 are the volatilities. This expression is just good-old-fashioned
an also be derived directly from OLS.

regarding pair trading

erior way of identifying "correlated" pairs than least square regression? Also,
esting the significance of parameters obtained from orthogonal regression

d, how do we ratio the pairs: shall we strive for dollar-neutrality or delta-


illan volatility also enters the calculation of the ratio, can somebody explain

of the beta estimate.


gression parameters, you can simply measure the diagonal distance
estimated regression line.http://mathworld.wolfram.com/Point-

tas as with ordinary OLS... I thought the purpose of Orthogonal regression

al, no?

) which does not change on interchanging x and y. Similarly if you look at t-


the intercept) you will end up with a formula which is unchanged on

nal regression and is very closely related to PCA. For N observations of k


ed to process a k dimensional matrix (Gauss-Jordan inversion, Cholesky,
Check out the Karhunen-Loeve transform which enables you to process an
eful for N<<k. HTH.

that will minimize the ("closest distance to the line")^2 rather than the
as with standard linear least squares.
nal regression or total least squares regression.
_least_squares
0.0549
0.4990
1.2477
1.3439
0.9849
1.5304
0.9769
1.2821
1.6606
1.5627
1.7861
1.7280
1.5151
1.2807
1.4339
1.9614
1.2501
1.5916
1.9384
1.8366
2.1051
2.0129
2.4959
2.1110
1.8384
1.7141
1.5000
2.0627
2.6729
2.3864
2.4871
2.2778
2.3559
2.8558
2.3110
2.2392
2.8841
2.3997
2.6302
3.3867
3.3287
3.3824
2.5985
2.8299
3.5722
3.3630
3.4912
3.6173
3.3579
3.1547
A Comparison of a Simple Linear Regression and

Simple Linear Regression Minimizes the sum of squared y

Total Least Squares Minimizes the sum of squared distanc

i xi yi
1 1 0.0118 0.0549
1 2 0.0648 0.4990
1 3 0.1365 1.2477
1 4 0.1509 1.3439
1 5 0.1730 0.9849
1 6 0.1934 1.5304
1 7 0.1991 0.9769
1 8 0.2523 1.2821
1 9 0.2714 1.6606
1 10 0.2844 1.5627
1 11 0.2897 1.7861
1 12 0.2987 1.7280
1 13 0.3028 1.5151
1 14 0.3093 1.2807
1 15 0.3412 1.4339
1 16 0.3420 1.9614
1 17 0.3704 1.2501
1 18 0.3784 1.5916
1 19 0.4449 1.9384
1 20 0.4692 1.8366
1 21 0.4966 2.1051
1 22 0.5226 2.0129
1 23 0.5341 2.4959
1 24 0.5417 2.1110
1 25 0.5466 1.8384
1 26 0.5681 1.7141
1 27 0.5936 1.5000
1 28 0.6213 2.0627
1 29 0.6449 2.6729
1 30 0.6602 2.3864
1 31 0.6614 2.4871
1 32 0.6822 2.2778
1 33 0.6946 2.3559
1 34 0.6979 2.8558
1 35 0.7027 2.3110
1 36 0.7271 2.2392
1 37 0.7373 2.8841
1 38 0.7948 2.3997
1 39 0.8180 2.6302
1 40 0.8216 3.3867
1 41 0.8385 3.3287
1 42 0.8537 3.3824
1 43 0.8600 2.5985
1 44 0.8757 2.8299
1 45 0.8801 3.5722
1 46 0.8939 3.3630
1 47 0.8998 3.4912
1 48 0.9568 3.6173
1 49 0.9797 3.3579
1 50 0.9883 3.1547
Sum 27.3780 106.8877

x bar 0.54756
y bar 2.13775
SSxx 3.57017
SSxy 10.21956
SSyy 34.04917

B^1 = SSxy/SSxx 2.862486 2.862486


B^0 = y bar - B^1 * x bar 0.570371 0.570371
Se 0.316091
Standard Err Slope 0.167289
Standard Err Intercept 0.101926

X0

Total Least Squares


(SSyy-SSxx)/(2SSxy) 1.49121
Slope + 3.28668
Slope - -0.30426
Intercept 0.33810

r = SSxy/SQRT( SSxx*SSyy) 0.926903


r2 0.859149 0.859149
Sum of Squares
ANOVA Table Linear Model 29.25333
Error 4.79585
Total 34.0491749042

Note: The Two Lines Intersect at (x bar, y bar)


Scatter Plot
4.00

3.50

3.00

2.50

2.00

1.50

1.00

0.50

0.00
0.00 0.20 0.40 0.60 0.80

Residuals of Simple Linear Regression

1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
1 2.3147
-0.6
-0.8
-1.0
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80
Simple Linear Regression and a Total Least Squares Fit

on Minimizes the sum of squared y deviations from the line of best fit

nimizes the sum of squared distances of the data points from the line of best fit

xi * y i x2i y2i y^i e^i (e^i)2


0.0006 0.0001 0.0030 0.6041 -0.5492 0.3017
0.0323 0.0042 0.2490 0.7559 -0.2569 0.0660
0.1703 0.0186 1.5568 0.9611 0.2866 0.0821
0.2028 0.0228 1.8061 1.0023 0.3416 0.1167
0.1704 0.0299 0.9700 1.0656 -0.0807 0.0065
0.2960 0.0374 2.3421 1.1240 0.4064 0.1652
0.1945 0.0396 0.9543 1.1403 -0.1634 0.0267
0.3235 0.0637 1.6438 1.2926 -0.0105 0.0001
0.4507 0.0737 2.7576 1.3472 0.3134 0.0982
0.4444 0.0809 2.4420 1.3845 0.1782 0.0318
0.5174 0.0839 3.1902 1.3996 0.3865 0.1494
0.5162 0.0892 2.9860 1.4254 0.3026 0.0916
0.4588 0.0917 2.2955 1.4371 0.0780 0.0061
0.3961 0.0957 1.6402 1.4557 -0.1750 0.0306
0.4892 0.1164 2.0561 1.5471 -0.1132 0.0128
0.6708 0.1170 3.8471 1.5493 0.4121 0.1698
0.4630 0.1372 1.5628 1.6306 -0.3805 0.1448
0.6023 0.1432 2.5332 1.6535 -0.0619 0.0038
0.8624 0.1979 3.7574 1.8439 0.0945 0.0089
0.8617 0.2201 3.3731 1.9134 -0.0768 0.0059
1.0454 0.2466 4.4314 1.9919 0.1132 0.0128
1.0519 0.2731 4.0518 2.0663 -0.0534 0.0029
1.3331 0.2853 6.2295 2.0992 0.3967 0.1574
1.1435 0.2934 4.4563 2.1210 -0.0100 0.0001
1.0049 0.2988 3.3797 2.1350 -0.2966 0.0880
0.9738 0.3227 2.9381 2.1965 -0.4824 0.2328
0.8904 0.3524 2.2500 2.2695 -0.7695 0.5922
1.2816 0.3860 4.2547 2.3488 -0.2861 0.0819
1.7238 0.4159 7.1444 2.4164 0.2565 0.0658
1.5755 0.4359 5.6949 2.4602 -0.0738 0.0054
1.6450 0.4374 6.1857 2.4636 0.0235 0.0006
1.5539 0.4654 5.1884 2.5232 -0.2454 0.0602
1.6364 0.4825 5.5503 2.5587 -0.2028 0.0411
1.9931 0.4871 8.1556 2.5681 0.2877 0.0828
1.6239 0.4938 5.3407 2.5818 -0.2708 0.0734
1.6281 0.5287 5.0140 2.6517 -0.4125 0.1701
2.1264 0.5436 8.3180 2.6809 0.2032 0.0413
1.9073 0.6317 5.7586 2.8455 -0.4458 0.1987
2.1515 0.6691 6.9180 2.9119 -0.2817 0.0793
2.7825 0.6750 11.4697 2.9222 0.4645 0.2158
2.7911 0.7031 11.0802 2.9706 0.3581 0.1283
2.8876 0.7288 11.4406 3.0141 0.3683 0.1357
2.2347 0.7396 6.7522 3.0321 -0.4336 0.1880
2.4781 0.7669 8.0083 3.0771 -0.2472 0.0611
3.1439 0.7746 12.7606 3.0896 0.4826 0.2329
3.0062 0.7991 11.3098 3.1291 0.2339 0.0547
3.1414 0.8096 12.1885 3.1460 0.3452 0.1191
3.4610 0.9155 13.0849 3.3092 0.3081 0.0949
3.2897 0.9598 11.2755 3.3747 -0.0168 0.0003
3.1178 0.9767 9.9521 3.3994 -0.2447 0.0599
68.7470 18.5613 262.5488 106.8877 0.0000 4.79585

68.7469842
68.7469842

Population Values

2.86248555 0.570371414 2.862485547


0.1672891 0.101926341 0.617351796
0.85914943 0.316090895 3.08103583
292.786686 48
29.2533291 4.795845791
TAN(ATAN[0.5*Linest(Yrange,Xrange)] + 0.5*ATAN[1/Linest(Xrange,Yrange)])
2.07084784

Actually I think there's a typo in there. The left inner term should probably be 0.5*ATAN[L
becomes an equally weighted average of the angles - not slopes - of the two regressions
derivation of that expression and Aaron Brown didn't give a proof.
W = (SSyy-SSxx)/(2SSxy)
W + SQRT(W2 + 1) Y=MX+C
W - SQRT(W + 1)
2

D. F. Mean Sq
1 29.25333
48 0.09991 (n-2)r2/(1 - r2) 292.7866863441
49 0.69488
Fobs = 292.7866863441
Fcritical = #VALUE!
P-Value = 2.62795027E-22
atter Plot

Data
Sim Lin
0.60 0.80 1.00 R

mple Linear Regression

-1.6906216

0.50 0.60 0.70 0.80 0.90 1.00


y^i Total
0.3769
0.5511
0.7867
0.8341
0.9067
0.9737
0.9925
1.1673
1.2301
1.2728
1.2903
1.3198
1.3333
1.3547
1.4595
1.4621
1.5555
1.5818
1.8003
1.8802
1.9703
2.0557
2.0935
2.1185
2.1346
2.2053
2.2891
2.3801
2.4577
2.5080
2.5119
2.5803
2.6210
2.6319
2.6476
2.7278
2.7614
2.9504
3.0266
3.0384
3.0940
3.1439
3.1646
3.2162
3.2307
3.2761
3.2955
3.4828
3.5581
3.5863
106.88770

3.331766811556
0.639605065346

hould probably be 0.5*ATAN[Linest(Y,X)] because then the intuition clearly


lopes - of the two regressions (y vs. x and x vs. y). I don't know of a rigorous
proof.

Você também pode gostar