Escolar Documentos
Profissional Documentos
Cultura Documentos
Outline
1
Introduction
Notation
History of Paper
Traces
Algebraic Trace Properties
Calculus Trace Properties
Trace Derivatives
Directional Derivatives
Example 1: tr(AX)
Example 2: tr(X T AXB)
Example 3: tr(Y 1 )
Example 4: |Y |
References
Steven W. Nydick
2/82
Introduction
Notation
Notation
A: A matrix
Ac : A matrix held constant
x: A vector
y: A scalar (or a scalar function)
xT or XT : The transpose of x or X
xij : The element in the ith row and jth column of X
(xT )ij : The element in the ith row and jth column of XT
Y
x :
y
X :
yij
x
y
xij
3/82
Introduction
Notation
f (x)
x1
f (x)
x2
...
f (x)
xn
T
If we have the function: f (x) = 2x1 x2 + x22 + x1 x23 , then the Gradient is
f (x)
f (x)
f (x)
f (x) T
=
x
x1
x2
x3
T
= 2x2 + x23 2x1 + 2x2 2x1 x3
Steven W. Nydick
4/82
Introduction
Notation
f(x)
.
..
.
=
. . . . .
xT
fk (x)
fk (x)
...
x1
xn
If we have the function
f(x) = 3x21 + x2
ln(x1 )
sin(x2 )
T
6x
f(x) 1 1
=
x1
xT
0
Steven W. Nydick
1
0
cos(x2 )
5/82
Introduction
Notation
2 f (x)
.
..
..
.
=
.
.
.
xxT
f (x)
f (x)
...
xn x1
x2n
Because our above Gradient is
f (x)
= 2x2 + x23
x
2x1 + 2x2
2x1 x3
T
0
2 f (x)
2
=
xxT
2x3
Steven W. Nydick
2 2x3
2
0
0 2x1
6/82
Introduction
History of Paper
2
4
The new editor told him that a reviewer said: nothing wrong with
paper but not too important.
Steven W. Nydick
7/82
Introduction
History of Paper
Wrote the article Better Never than Late: Peer Review and the
Preservation of Prejudice in 2001.
Steven W. Nydick
8/82
Introduction
History of Paper
Steven W. Nydick
9/82
Traces
What is a Trace?
Definition:
tr (Y) =
(yii ),
Y is square
a11
..
A= .
an1
Steven W. Nydick
..
.
a1n
.. ,
.
ann
10/82
Traces
Linearity of Traces
The MOST important aspect of traces (for our later derivations):
tr : M (R)n R1
is linear
Thus:
tr (A + B) = tr (A) + tr (B)
(1)
and
tr (cA) = c tr (A)
Steven W. Nydick
(2)
11/82
Traces
(3)
Thus:
tr (YT )
tr (Y)
=
X
X
Steven W. Nydick
12/82
Traces
Cyclic Permutation
Property 2: Cyclic Permutation
We have:
tr (AB) = tr (BA)
(4)
b11 b1n
a11 a1m
.. ..
..
..
..
tr (AB) = tr ...
.
.
.
. .
bm1 bmn
an1 anm
m
m
X
X
= a11 b11 + + a1m bm1 +
(a2i bi2 ) + +
(ani bin )
i=1
n X
m
X
i=1
(aji bij )
j=1 i=1
Steven W. Nydick
13/82
Traces
Cyclic Permutation
And also start from the right of Equation
b11 b1n
a11
..
.
.
..
.. ...
tr (BA) = tr .
bm1
=
Pm Pn
i=1
bmn
an1
Pn
j=1
(4).
..
.
Pm
a1m
..
.
anm
n X
m
X
(aji bij )
j=1 i=1
= tr (AB)
So:
tr (AB)
tr (BA)
=
X
X
Rotating the order does not change the trace of square matrices.
Steven W. Nydick
14/82
Traces
Cyclic Permutation
Steven W. Nydick
(uij hij ) =
m X
n
X
(uT )ji hij = tr (UT H)
(5)
j=1 i=1
15/82
Traces
tr (Y)
xij
Thus:
tr (Y)
tr (Y)
=
, (j = 1, . . . , m; i = 1, . . . , n) =
T
xji
(X )
because
tr (Y)
xji
Steven W. Nydick
tr (Y)
X
T
(6)
16/82
Traces
Steven W. Nydick
17/82
Traces
Product Rule
Calculus Property 2: Product Rule
An illustration of the product rule:
Steven W. Nydick
18/82
Traces
Product Rule
Calculus Property 2: Product Rule
Based on the previous illustration:
d(uv)
=
dx
du
dx
(v) + (u)
dv
dx
Steven W. Nydick
19/82
Traces
Product Rule
If we take the derivative of the matrix product with respect to a scalar:
(UV)
x
we find that the i,jth place in our new derivative matrix is
(uTi. v.j )
(ui1 v1j + + uin vnj )
=
x
x
(ui1 v1j )
(uin vnj )
=
+ +
x
x
v1j
vnj
ui1
uin
=
v1j + ui1
+ +
vnj + uin
x
x
x
x
Steven W. Nydick
20/82
Traces
Product Rule
So, now we want to collect terms:
(uTi. v.j )
v1j
vnj
ui1
uin
=
v1j + ui1
+ +
vnj + uin
x
x
x
x
x
v
vnj
ui1
uin
1j
=
v1j + +
vnj + ui1
+ + uin
x
x
x
x
T
v.j
ui.
=
v.j + uTi.
x
x
And because our element is arbitrary, we can generalize:
(UV)
U
V
=
V+U
x
x
x
(UVc ) (Uc V)
=
+
x
x
Steven W. Nydick
(7)
21/82
Traces
Product Rule
There are two notes on the product rule:
Note 1: For the product rule to make sense, both U and V should be
functions of X.
For the Univariate Case, let u = x2 + 2 and v = 2x + sin(x). Then:
d(uv)
du
dv
=
(v) + (u)
dx
dx
dx
= (2x) 2x + sin(x) + (x2 + 2) 2 + cos(x)
= 4x2 + 2x sin(x) + 2x2 + 4 + x2 cos(x) + 2 cos(x)
= x2 6 + cos(x) + 2 x sin(x) + cos(x) + 4
We will discuss the multivariate case later.
Steven W. Nydick
22/82
Traces
Product Rule
There are two notes on the product rule:
Note 2: We can put Vc and Uc inside the derivative funtion, but they
are now constants with respect to X, even if they are functions of X.
For the Univariate Case, let u = x3 + ln(x) and v = 3x2 . Then:
h
i
3 + ln(x) (3x2 )
d
x
c
d(uvc )
=
dx
dx
1
= 3x2 +
(3x2 )
x
= 9x4 + 3x
We will discuss the multivariate case later.
Steven W. Nydick
23/82
Traces
z
x 4x1 + cos(x2 )
= 1 =
x z
x1 sin(x2 )
x2
Our partial derivative with respect to x1 is 4x1 + cos(x2 ), and our
partial derivative with respect to x2 is x sin(x2 ). Furthermore, these
go in the respective parts of our derivative matrix (replacing x1 and x2 ).
Steven W. Nydick
24/82
Traces
and
x2 = t
Then:
dz
dt
[4(3t) + cos(t)]
Steven W. Nydick
by x1
by x2
25/82
Traces
z
x
T
x X
=
t
i=1
z xi
xi t
Steven W. Nydick
26/82
Traces
Z
Z
=
x1
x2
Z t
Z
Z
Z
~
Z
x2
Z
Z
Z
z
x2
z Z
x1 Z
Z
~
Z
=
27/82
Traces
Z
Z
=
x1
x2
Z t
Z
Z
Z
~
Z
x2
Z
Z
Z
z Z
x1 Z
Z
~
Z
z
x2
=
28/82
Traces
Z
Z
x2
t
Z
Z
x1
=
Z
~
Z
x2
Z
zZ
x1
z
x2
Z
Z
~
Z
=
dt
dt
1=
dx1
dx1
dx1 dt
dx1
(x1 distance) =
dz
dz dx1
29/82
Traces
1 dt
Thus, if t moves 1 unit, it moves z: dx
dz dx1 through x1 and it moves
2 dt
z: dx
dz dx2 through x2 , so it in total moves z:
dx1 dt
dx2 dt
z total distance =
+
dz dx1
dz dx2
Steven W. Nydick
30/82
Traces
Z
Z
=
x1
x2
Z t
Z
Z
Z
~
Z
x2
Z
Z
Z
z
x2
z Z
x1 Z
Z
~
Z
=
z
x1
x1
t
+
z
x2
x2
t
2
X
z xi
=
xi t
i=1
31/82
Traces
(8)
(9)
Steven W. Nydick
32/82
Trace Derivatives
Directional Derivatives
Derivatives
Here is the standard derivative definition:
Df (x) = lim
t0
f (x + t) f (t)
t
y
The equation is a infinitesimal form of m = x
; it is finding the slope
or linear approximation to this function as the distance between the
points on the x-axis goes to 0.
Steven W. Nydick
33/82
Trace Derivatives
Directional Derivatives
Dw f (x) = lim
34/82
Trace Derivatives
Directional Derivatives
Dw f (x) = lim
Steven W. Nydick
35/82
Trace Derivatives
Directional Derivatives
Steven W. Nydick
36/82
Trace Derivatives
Directional Derivatives
37/82
Trace Derivatives
Directional Derivatives
If w(i)
f (x + tw(i) ) f (x)
t0
t
Steven W. Nydick
38/82
Trace Derivatives
Directional Derivatives
t0
f (x + tw) f (x)
= wT u
t
reduces to
Dw(i) f (x) = ui
39/82
Trace Derivatives
Directional Derivatives
t0
f (X + tY) f (X)
t
(10)
Steven W. Nydick
40/82
Trace Derivatives
Directional Derivatives
DY f (X) = lim
(11)
by (5)
= uij
Because the ijth place is arbitrary:
f (X)
=U
X
Steven W. Nydick
(12)
41/82
Trace Derivatives
Directional Derivatives
t0
f (X + tY) f (X)
t
42/82
Trace Derivatives
Example 1: tr(AX)
Definition
Our 1st function:
f (X) = tr(AX)
DY f (X) = lim
Steven W. Nydick
43/82
Example 1: tr(AX)
Trace Derivatives
Calculation
f (X + tY) f (X)
t0
t
tr(A[X + tY]) tr(AX)
= lim
t0
t
tr(AX + AtY) tr(AX)
= lim
t0
t
tr(tAY)
= lim
t0
t
= lim tr(AY)
DY f (X) = lim
t0
by (10)
by (1)
by (2)
= tr(AY)
= tr([AY]T )
T
by (3)
= tr(Y A )
Steven W. Nydick
44/82
Trace Derivatives
Example 1: tr(AX)
Result
So, we found that:
f (X + tY) f (X)
t
= tr(YT AT ) = tr(YT U)
DY f (X) = lim
t0
by (11)
U=
Steven W. Nydick
(13)
45/82
Trace Derivatives
Definition
Our 2nd function:
f (X) = tr(XT AXB)
DY f (X) = lim
Steven W. Nydick
46/82
Trace Derivatives
Calculation
f (X + tY) f (X)
by (10)
t
tr([X + tY]T A[X + tY]B) tr(XT AXB)
= lim
t0
t
tr([X + tY]T A[X + tY]B XT AXB)
= lim
by (1)
t0
t
tr(XT AtYB + tYT AXB + tYT AtYB)
= lim
t0
t
T
T
tr(t[X AYB + Y AXB + tYT AYB])
= lim
t0
t
= lim[tr(XT AYB + YT AXB + tYT AYB)]
by (2)
DY f (X) = lim
t0
t0
Steven W. Nydick
47/82
Trace Derivatives
Calculation
Continuing:
DY f (X) = lim[tr(XT AYB + YT AXB + tYT AYB)]
t0
t0
Steven W. Nydick
by (1)
by (4)
48/82
Trace Derivatives
Calculation
And Finally:
DY f (X) = tr(BXT AY) + tr(YT AXB)
= tr[(BXT AY)T ] + tr(YT AXB)
by (3)
by (1)
Steven W. Nydick
49/82
Trace Derivatives
Result
So, we found that:
f (X + tY) f (X)
t
T
T
= tr(Y [A XBT + AXB]) = tr(YT U)
DY f (X) = lim
t0
by (11)
U=
Steven W. Nydick
(14)
50/82
Trace Derivatives
Example 3: tr(Y 1 )
Definition
Steven W. Nydick
51/82
Trace Derivatives
Example 3: tr(Y 1 )
Calculation
We have:
tr(Y1 )
tr(Y2 Y)
=
X
X
=
tr(Y2 Yc )
tr(Y2
c Y)
+
X
X
by (7)
Zoom In
9
1
tr(Y1 Y1 Yc )
tr(Yc Y1
tr(Y1 Y1
c Y )
c Yc )
=
+
X
X
X
by (7) & (4)
Steven W. Nydick
tr(Y1 ) tr(Y1 )
2 tr(Y1 )
+
=
X
X
X
(15)
52/82
Trace Derivatives
Example 3: tr(Y 1 )
Result
Therefore:
tr(Y1 )
tr(Y2
tr(Y2 Yc )
c Y)
=
+
X
X
X
1
tr(Y2
Y)
2
tr(Y
)
tr(Y1 )
c
=
+
X
X
X
tr(Y1 )
tr(Y2
Y)
c
=
X
X
by (15)
(16)
53/82
Trace Derivatives
Example 4: |Y |
Definition
Steven W. Nydick
54/82
Trace Derivatives
Example 4: |Y |
Determinant Review
We know from linear algebra:
Y1 =
1
Q
|Y|
Thus:
|Y|I = QY
Steven W. Nydick
(17)
55/82
Trace Derivatives
Example 4: |Y |
Adjoint Review
Note: (q T )ij = qji does not depend on any of the elements in row i
or column j of Y.
Thus, qji does not depend on yij .
So:
(qji yij )
yij
= qji
|Y| 0
.
0 |Y| ..
..
..
.
.
0
0
Steven W. Nydick
|Y|I = QY
P
0
p (q1p yp1 )
..
..
.
.
=
0
|Y|
O
O
..
by (17)
.
P
p (qnp ypn )
56/82
Trace Derivatives
Example 4: |Y |
(qjp ypj )
(18)
And, for an arbitrary yij , pick the jth row of q and column of y:
P
(q
y
)
jp
pj
p
|Y|
=
by (18)
yij
yij
X
ypj
yij
=
qjp
= qji
= qji = (q T )ij
(19)
y
y
ij
ij
p
Because yij was arbitrary, for an entire matrix:
|Y|
= QT
Y
Steven W. Nydick
(20)
57/82
Trace Derivatives
Example 4: |Y |
i
j cji ij
=
xpq
tr(Qc Y)
=
by (5)
xpq
Because xpq was arbitrary, for an entire matrix:
|Y|
tr(Qc Y)
=
X
X
Steven W. Nydick
(21)
58/82
Definition
Lets say that we have
A=X+E
where A is the observation matrix, E is a matrix of stochastic
fluctuations with a mean of 0, and X is our approximation to A.
In Least Squares, our objective is to minimize the
sum of squared errors:
SSE = e211 + e212 + + e21n + e221 + + e22n + + e2mn
XX
=
e2ij
i
XX
=
(eij eij )
i
j
T
= tr(E E)
Steven W. Nydick
by (5)
59/82
Definition
Steven W. Nydick
60/82
Calculation
A minimization:
(SSE)
tr(ET E)
tr[(A X)T (A X)]
=
=
X
X
X
T
T
T
tr(A A A X X A + XT X)
=
X
T
[tr(A A) tr(AT X) tr(XT X) + tr(XT X)]
=
by (1)
X
tr(AT A) tr(AT X) tr(XT A) tr(XT X)
=
+
X
X
X
X
tr(AT X) tr(XT A) tr(XT X)
=0
+
X
X
X
Steven W. Nydick
61/82
Calculation
Continuing:
(SSE)
tr(AT X) tr(XT A) tr(XT X)
=
+
X
X
X
X
tr(AT X) tr(XT X)
+
by (13) & (3)
= A
X
X
tr(XT X)
= A A +
by (13)
X
tr(XT IXI)
= A A +
X
T
= A A + [I XIT + IXI]
by (14)
= A A + [X + X] = 2A + 2X
Steven W. Nydick
(22)
62/82
Result
As in any Least Squares problem, we should set our derivative equal to
0 in order to find the minimum of the function.
(SSE)
= 2A + 2X = 0
X
2X = 2A
b =X=A
A
Surprisingly, without any constraints on X, the best approximation of
A is A itself.
Oh, the things you learn in calculus ^!
Steven W. Nydick
63/82
LaGrange Multipliers
Pretend you have a function:
f (X)
To maximize or minimize less than mn restraints equivalent to
h(x11 , . . . , xmn )ij = 0
use LaGrange Multipliers uij (one for each restraint), and set
g(X) = f (X) +
XX
i
(uij hij )
(23)
Finally, take the derivative with respect to X, set equal to 0, and solve.
Steven W. Nydick
64/82
Definition
We still want to find an X that minimizes the SSE to best
approximate A; however, we are now subject to the constraint that X
is a Symmetric Matrix.
X Symmetric Means:
X = XT
X XT = 0
The Recipe:
1
2
3
4
5
6
Steven W. Nydick
65/82
Calculation
XX
(uij hij )
i
by (23)
= tr(ET E) + tr(UT H)
by (5)
= tr(ET E) + tr[UT (X XT )]
= tr(ET E) + tr(UT X) tr(UT XT )
= tr(ET E) + tr(UT X) tr(UX)
Steven W. Nydick
by (1)
by (3) & (4)
66/82
Calculation
X
X
X
tr(UT X) tr(UX)
= 2A + 2X +
X
X
T
= 2A + 2X + U U
Steven W. Nydick
by (22)
by (13)
67/82
Calculation
Third -- set the derivative equal to 0:
g(X)
= 2A + 2X + U UT = 0
X
2X = 2A + UT U
X=A+
UT U
2
XT = AT +
Steven W. Nydick
U UT
2
68/82
Result
Fourth -- add XT to both sides and solve for X:
U UT
UT U
+ AT +
2
2
T
T
U U U U
X + X = A + AT +
2
2
T
2X = A + A
X + XT = A +
T
b =X= A+A
A
2
Steven W. Nydick
69/82
Definition
R is a correlation matrix.
diag(R) = I
Steven W. Nydick
70/82
Definition
The function we want to maximize:
u = |U1 (R FFT )U1 |
The function u is a likelihood ratio criterion for a test of independence
after the common factors have been partialed out of the covariance
matrix.
U1 (R FFT )U1 should be close to I, so u should be close to 1.
We want to find the F (and consequently the U2 ) that results in a
determinant as close to 1 as possible.
Steven W. Nydick
71/82
Definition
To make the derivatives simpler, let:
u1 = |U2 |
and
u2 = |R FFT |
Note that:
u1 u2 = |U2 ||RFFT | = |U1 ||RFFT ||U1 | = |U1 (RFFT )U1 |
by (7)
72/82
Calculation:
u1
F
Steven W. Nydick
73/82
Calculation:
u1
F
Continuing:
|U|
|I diag(FFT )|
=
F
F
T
tr Qc [I diag(FF )]
=
F
tr(Qc ) tr[Qc diag(FFT )]
=
F
F
tr[Qc diag(FFT )]
=0
F
tr[Qc diag(FFT )]
=
F
Steven W. Nydick
(by definition)
by (21)
by (1) & (2)
74/82
Calculation:
u1
F
Qc = |U |(U
by (17)
2
75/82
Calculation:
u1
F
Continuing:
tr(Qc FFT )
tr(FT Qc FI)
=
F
F
T
= (Q FIT + QFI)
by (4)
by (14)
= (Q + Q)F
= 2QF
= 2|U2 |U2 F
(Q is symmetric)
by (17)
And, thus:
u1
|U|2
= |U2 |
F
F
= |U2 |2 (2|U2 |U2 F)
by (24)
= 2|U2 |1 U2 F
= 2|U2 |U2 F
Steven W. Nydick
(25)
76/82
Calculation:
u2
F
=
F
F
tr(FT Qc FI)
=0
F
T
= (Q + Q)F
by (21)
by (1) & (2)
by (4)
by (14)
= 2QF
(Q is symmetric)
T
T 1
= 2|R FF |(R FF )
Steven W. Nydick
(26)
77/82
Steven W. Nydick
78/82
Steven W. Nydick
79/82
Steven W. Nydick
(27)
80/82
Result
Based on the previous slide, we have
(RU2 I)F = F(FT U2 F)
Let = (FT U2 F) be diagonal. Then
1 0
0 2
(RU2 I)(f1 , f2 , . . . , fn ) = (f1 , f2 , . . . , fn )
..
.
0
..
.
..
.
0
0
..
.
0
n
81/82
References
References
Steven W. Nydick
82/82