Você está na página 1de 6

JOURNAL OF CHEMOMETRICS, VOL.

11, 33 – 38 (1997)

A FAST METHOD TO COMPUTE ORTHOGONAL LOADINGS


PARTIAL LEAST SQUARES

CONSTANTINOS GOUTIS
Departamento de EstadÑ ıstica y EconometrÑ ıa, Universidad Carlos III de Madrid, c/ Madrid 126, E-28903 Getafe,
Madrid, Spain

SUMMARY

We give a computationally fast method for orthogonal loadings partial least squares. Our algorithm
avoids the multiple regression computations at each step and yields identical scores and loadings
to  the usual method. We give a proof of the equivalence to the standard algorithm. We discuss briefly
the  computational advantages over both orthogonal scores and orthogonal loadings partial least squares.
© 1997 by John Wiley & Sons, Ltd.€

KEY WORDS biased regression methods; bilinear forms; calibration; projection matrices

1. INTRODUCTION

There are two commonly used partial least squares (PLS) regression algorithms. 1 – 3 The first, the
orthogonal scores algorithm, is the older one, 4 whereas the second, the orthogonal loadings
algorithm, is more recent. 5 Their difference lies in the decomposition of the data matrix
X = ∑ ri =1 t ip iT. The first one requires orthogonality of the vectors t i , whereas the second one
requires orthogonality of the vectors p i. As it turns out, the two methods are equivalent for
prediction purposes. 1 In the usual formulation the first algorithm is easier computationally,
though not always faster, 6 and is typically used in computer packages. 7,8 This is because it does
not require any multiple regression and the orthogonality of the scores simplifies their further
use. The second algorithm is easier to interpret and study theoretically. 5
The purpose of this paper is to give a fast method to compute the scores and loadings of the
second algorithm. Our method avoids the multiple regression step and, as we will see, requires
significantly fewer computations than both the orthogonal scores and orthogonal loadings
algorithms. Furthermore, it is trivial to program.

2. ORTHOGONAL LOADINGS PLS METHODS

Consider data in the form of a columnwise centered n × k matrix X and a centered n × 1 vector
y. In the usual formulation 2 the PLS steps are as follows.

CCC 0886 – 9383/97/010033 – 06 $17.50 Received November 1995


© 1997 by John Wiley & Sons, Ltd. Accepted April 1996
34 C. GOUTIS

Set E0 = X and f 0 = y.
For i = 1, 2, …, r:
† T
†
†
E i − 1f i − 1
† …

pi = † (1)
√(f E i − 1E Ti − 1 f i − 1)
†

††
T
†„i − 1 … … † …

t i = Ei − 1pi
… † † † (2)
T
E i = Ei − 1 − t ip i
… † … † … † (3)
T −1 T
f i = y − T i ( T T i) T y
… … … † …
i … … …
i (4)
where Ti = (t1 t2 … t i ).
We propose to replace the above algorithm by the following.
Set E0 = X and g0 = y
For i = 1, 2, …, r:
† T
†
†
E i − 1g i − 1
† …

pi =
† (5)
√(g
†

††
T
†„i − 1 E i − 1E Ti − 1 g i − 1)
… … † …

t i = E i − 1 pi
… … † † (6)
T
Ei = Ei − 1 − tipi
… … † … … … † (7)
g i = −t i
… … (8)
€Clearly this can be written in a more concise way by merging e.g. (6) and (8). However, we
prefer the above form since it facilitates the comparison with (1) – (4). The two algorithms are
equivalent in the sense that they decompose X into identical bilinear forms. Before proving this,
we define some notation and state a few lemmas that will be needed in the proof. We will denote
the range and null space of a matrix E by R (E) and N (E) respectively and its Moore – Penrose
inverse by E + . For i = 1, 2, …, r and t i, E i and f i given by (2) – (4), we define K i , L i , M i and N i
to be the projection matrices onto N (Ti ), R (E i ), N (Ti ) ° R (E i − 1 ) and R (Ti ) ° R (E i − 1 )
respectively. As a convention we take M 0 to be the identity matrix. With this notation we have
the following two general lemmas. Their proofs are straightforward applications of matrix
algebra.

Lemma 1
Suppose that f and g are vectors and  f and gÒ are the corresponding projections onto R (E) for
some matrix E. Then
(a)E T f = E T f and
(b)if E T f = E Tg, then  f = gÒ .

Lemma 2
For any matrix E
(E + ) T = (EE T ) + E (9)
(E + ) TE + = (EE T ) + (10)
The next two lemmas refer to the orthogonal loadings steps.
FAST PARTIAL LEAST SQUARES COMPUTATION 35

Lemma 3
If E i and E i +1 are given by consecutive steps of the method (1) – (4) then
R (E i ) = R (E i + 1 ) + R (t i + 1 ) (11)
and hence R (E i +1 ) is a subspace of R (E i ).

Proof
Since p i +1 ‰ R (E iT ) and E i+ E i is the projection matrix onto R (E iT ), it follows that p i + 1 = E i+ t i + 1 .
Step (3) becomes
E i +1 = E i − t i + 1t Ti + 1 (E i+ ) T (12)
T T +
= Ei − t i +1 i +1 t (E iE ) E i i (13)
where equality (13) follows from (9). Now from (10) and the normalization p Ti + 1p i + 1 = 1 we can
see that the matrix t i +1t Ti +1 (E iE iT ) + is idempotent and hence a (non-orthogonal) projection
matrix onto its range R (t i +1 ). This and step (13) show that the columns of E i + 1 are the residuals
from projections of the columns of E i onto t i + 1. The result follows immediately. D

Lemma 4
If E i , E i +1 and f i are given as in (1) – (4), then E Ti + 1L i f i = 0.

Proof
After expressing E Ti +1 and L i in terms of E i and f i and substituting, we obtain
† T T T
T T †
†
E i fi f i E i E i
… † … … … …
T + T
E Li fi = E − E i (E i E i ) E i f i (14)
…

… … …† … „ … „ … …… … … … … … …
i+1 i T T
f EiE f
† …i †
†

… … …i … …i

† T T T T + T
T T + … T †
†
E i f i f i E i E i E i (E i E i ) E i f i
… … … … … … † … … … … … … … † … … … †

† = E i E i (E i E i ) E i f i −
… … … … … … … … … … … …
T T
€ (15)
f Ei E f
†

†† †i … † … …i … …i

† T T T
T †
†
Ei fifi EiEi fi
… … … … … … … … …

„ = Ei fi −„ … …
T T
=0 (16)
f E iE f
…i … … …i …i†
†

where Lemma 1(a) was used to simplify (15).


Now our main theorem can be stated as follows.

Theorem 1
The vectors p i and t i, i = 1, 2, …, r, given by the algorithm (1) – (4) are identical with the ones
given by the algorithm (5) – (8).

Proof
It suffices to show that for the method (1) – (4) the vector E iT f i is a negative multiple of the
vector E iTt i for all i.
36 C. GOUTIS

We have
E iT f i = E iTK iy by (4) (17)
= E iTL iK iy using Lemma 1(a) (18)
= E iTL iL i −1K iy since, by Lemma 3, L i projects onto a subspace
of R (L i − 1 ) (19)
T
=E iL i (M i + N i )K iy (20)
=E i
T
L iM iK iy since N (Ti ) ° R (Ti ) = 0 (21)
=E T
iL iM iy since M i projects onto a subspace of N (Ti ) (22)
T
=E iM iy using Lemma 1(a) (23)
From Lemma 1(b) it follows that L i f i = L iM iy, and since this vector belongs to R (E i ), it can be
written as the sum of its projections onto R (M i + 1 ) and R (N i + 1 ). Therefore
L i f i = M i + 1L iM iy + N i + 1L iM iy (24)
The first term of the above sum can be written as
M i +1L iM iy = M i +1M iy since M i + 1 projects onto a subspace of R (E i ) (25)
= M i +1y since R (M i + 1 ) is a subspace of R (M i ) (26)
On the other hand, by applying Lemma 3 recursively, we see that none of t1 , t2 , …, t i belongs to
R (E i ) whereas t i +1 does, so
R (N i +1 ) = R (Ti +1 ) ° R (E i ) = R (t i + 1 ) ° R (E i ) = R (t i + 1 ) (27)
and the second term of the sum in (24) has the form αt i + 1 for some α. This α is positive since
† T T
T †
†
fi EiEi LiMiy
… … … … … … … … … …

ti + 1Li Mi y =
… † … … … „ „ €(28)
√(f
†

††
T
…i − 1… E i − 1E … †
T
…i − 1… †i − 1 f )
† T T
†
†
fi EiEi fi
… … … … … … …

= „ €(by (17)−(22)) (29)


√(f
†

††
T
…i − 1… E i − 1E … †
T
…i − 1… †i − 1 f )
which is positive. Premultiplying (24) by E Ti + 1 and applying Lemma 4, we obtain
0 = E Ti + 1 M i + 1y + αE Ti + 1t i + 1 (30)
= E iTM iy + αE iTt i (31)
=E iT f i + αE iTt i (32)
where (31) is a simple change of label and (32) follows from (17) – (23). Hence we obtain
E iT f i = − αE iTt i and the theorem is proved. D

3. DISCUSSION
It is clear that the method presented here has advantages in terms of speed of computation. The
number of necessary numerical operations is also smaller than that of the orthogonal scores
algorithm, since the latter needs an extra step to compute p i . The total flop counts for the
orthogonal scores and orthogonal loadings algorithms are 6
† 3 † 2 †

†
†
2r … … †
†
3r … … †
†
17 r …

(9 n + 5 + 2 r)rk + 7 nr +
… † † + + (33)
3
†

†† 2
†

†† 6
†

††
FAST PARTIAL LEAST SQUARES COMPUTATION 37

and
† 3 † 2 † † 4 † 2
…

†
†
2r … … †
†
13 r … … †
†
41 r … †
†
r †
3 †
†
19 r …

(7 n + 5) rk +
… … + + … n− −r + … + 6r
† (34)
3
†

†† 2
†

†† 6
†

†† 6
†

†† 6
†

††

respectively, whereas for the algorithm (5) – (8), including the last step to compute the
regression coefficients, the total flop count is
† 3 † …

†
†
2r … …
2 †
†
26 r …

(7 n + 5) rk + 9 nr −
… … … + 2(n − 1)r + … (35)
3
†

†† 3
†

††

Table 1 presents the numbers of flops for some selected values of n, k and r. The computational
advantages over the classical orthogonal loadings method are more substantial for large n and
for a large number of factors. Compared with the orthogonal scores method, it is significantly
faster for large k. The speeding of computation will help most when applying computer-
intensive methods such as cross-validation.
However, the mathematical equivalence of the two orthogonal loadings algorithms does not
necessarily imply that the answers will be the same. There is always numerical error which can

Table 1. Number of flops (in thousands) of standard orthogonal scores (OS), orthogonal loadings (OL)
and new algorithm (NA) and percentage of savings of new algorithm compared with standard orthogonal
scores and orthogonal loadings

k n r OS OL NA % gain over OS % gain over OL

50 30 4 58 48 45 21·8 6·6
7 103 93 80 22·5 13·8
10 150 147 115 23·3 21·4
50 4 94 80 74 21·0 6·7
7 167 153 132 20·9 14·0
10 242 244 191 20·9 21·8
100 4 185 158 148 20·3 6·7
7 327 306 263 19·7 14·1
10 470 489 381 19·1 22·1
100 30 4 114 91 88 22·9 3·5
7 204 168 155 24·0 7·6
10 298 254 223 25·2 12·3
50 4 187 151 145 22·1 3·5
7 331 278 256 22·6 7·7
10 479 422 369 23·1 12·6
100 4 368 299 289 21·5 3·5
7 649 552 509 21·5 7·8
10 933 841 733 21·4 12·8
300 30 4 341 263 260 23·6 1·2
7 609 469 456 25·1 2·7
10 888 684 653 26·5 4·6
50 4 557 435 429 22·9 1·2
7 988 775 753 23·7 2·8
10 1429 1132 1079 24·5 4·7
100 4 1098 863 853 22·4 1·2
7 1935 1539 1496 22·7 2·8
10 2783 2251 2143 23·0 4·8
38 C. GOUTIS

rapidly accumulate. Hence one must also examine the numerical stability. To do so, we
implemented the method with some simulated data using a GAUSS program on a PC. We also
tried it in some real-life data coming from a near-infrared calibration experiment. The
differences between the loadings and the scores of the standard and the new method were of the
order of the machine accuracy. Although small-scale trials cannot be conclusive, the method
performed satisfactorily and we would feel confident to use it.

ACKNOWLEDGEMENTS

I would like to thank Sijmen de Jong for his encouragement and a referee for useful comments.

REFERENCES
1. I. S. Helland, Commun. Stat. — Simul. Comput. 17, 581–607 (1988).
2. H. Martens and T. Naes, Multivariate Calibration, Wiley, Chichester (1989).
3. T. Naes, C. Irgens and H. Martens, Appl. Stat. 35, 195–206 (1986).
4. S. Wold, H. Martens and H. Wold, Proc. Conf. on Matrix Pencils, pp. 286–293, Springer Verlag,
Heidelberg (1983).
5. T. Naes and H. Martens, Commun. Stat.—Simul. Comput. 14, 545–576 (1985).
6. M. C. Denham, Stat. Comput. 5, 191 –202 (1995).
7. SIMCA-S 5.1, Umetri AB, UmeØa (1995).
8. UNSCRAMBLER 6.0, Camo AS, Trondheim (1995).

Você também pode gostar