 |
Table of Contents
Play it again, Sam. [Woody Allen, 1972]
Kiss me. Kiss me as if it were the last time. [Ilsa Lund Laszlo, Casablanca, 1942]
We now consider the PLS2 algorithm. We assume that
and
are centered data matrices. As already mentioned, one may
use PLS1 separately for each analyte (
-column), which allows
a separate optimal model to be constructed for each analyte. It may,
however, be advantageous to include information from other analytes when
predicting any specific analyte. This may be done by constructing an overall
model describing
as a function of
, and for
this purpose we may use the PLS2 method.
When several analytes are to be predicted simultaneously, the situation
becomes more complicated than for the PLS1 algorithm. Suffice it to say that
separate application of the PLS1 algorithm to each column of
would lead to different sets of scores being formed for each
-column. In PLS2, these separate scores are in effect reconciled into a
single set of scores, but this extra constraint implies a more complex
algorithm. Note that such a complication does not arise in connection with
PCR, because PCR does not take
into account when forming
the scores.
The principle behind the PLS2 algorithm may be outlined as follows. Similar
to PLS1, we form a model for
as in (7.3), namely
 |
(8.1) |
when
scores are to be used. The scores (columns of
) is
the single set of scores alluded to above. But now we form a similar model
for
, namely
 |
(8.2) |
This includes a second set of scores for
, namely the
columns of
. These two equations are linked by an inner relationship,
 |
(8.3) |
meaning a relationship that holds between latent, rather than observed
variables. The two matrices
and
are both
, and
has orthogonal columns.
is
a
matrix,
is an
matrix whose
columns are unit vectors, and
is a
diagonal
matrix of regression coefficients. Similar to PLS1, we will also need the
orthogonal matrix
.
The three `error' terms
and
are supposed to represent noise. Hence
should be
chosen large enough to make the term
useless for
predicting
; in other words,
and
should be approximately uncorrelated. The
relations (8.1), (8.2) and (8.3) are not enough, in
themselves, to define the method, which will again be defined by the actual
algorithm.
Ignoring the error terms in (8.2) and (8.3), and using the
estimated value of
, we obtain the predicted value of
as follows:
 |
(8.4) |
As before,
is predicted by means of the
-scores in
, but the
-loadings
now enter the prediction equation as well.
Since the PLS2 algorithm is based on the covariance between
and
, a
-column with large elements relative
to the remaining columns will receive a correspondingly larger weight in the
algorithm. Since the
-columns normally represent different
analytes, that may be measured in different scales, it is advisable to scale
the columns of
properly, for example by autoscaling.
Similar remarks apply in principle to
, but usually
is a spectral matrix, where autoscaling is not necessary.
After this overture, we now proceed to describe the actual PLS2 algorithm,
which, like the NIPALS algorithm, is iterative, rather than just recursive.
The algorithm starts with the initialization
,
and
, and then proceeds
through the following steps to find the first
terms:
- The vector
is initialized to be an arbitrary
column of
.
- Let
.
- Let
.
- Let
- Let
.
- If
is unchanged continue with Step 7; otherwise
go back to Step 2.
- Let
.
- Let
.
- Let
and
.
- Stop if
; otherwise let
and return to Step 1.
Now form the matrices
,
,
,
and
with columns
,
,
,
and
, respectively, and form the
diagonal
coefficient matrix
with diagonal elements
. After
runs through the algorithm, the relations (8.1), (8.2), (8.3) and (8.4) are satisfied.
In the special case
, PLS2 reduces to the PLS1, because then
in Step 4 is 1, and
in Step 5.
We now comment on the main steps of the PLS2 algorithm. We consider run
number
, and assume that convergence has been achieved in Step 6, so that
all equations are satisfied.
Step 2. This step is similar to Step 1 of the PLS1 algorithm. But
now the weights are based on the covariance between
and
, rather than between
and the
single column
. In this way,
is
used as a representative of
, a compromise that is
necessary in order to reconcile the information coming from the different
columns of
Step 3. The score vector
is formed as a linear
combination of the columns of
with weights from
, just as we did in the PLS1 algorithm.
Step 4. The weights
are calculated from the
covariances between
and
in much
the same way that
is calculated from the covariances
between
and
Step 5. The score vector
is formed as a linear
combination of the columns of
with weights from
; a parallel with Step 3.
Step 7. The regression coefficient
is calculated
by ordinary linear regression of
on
.
Step 8. The vector
is the vector of
regression coefficients obtained from multiple linear regression of
on
.
Step 9. The vector
represents the residuals after
regressing
on
, and correspondingly
are residuals after regressing
on
and taking the form (8.3) of
into account. This step ensures that the
-vectors become orthogonal.
After completing the
runs, the relations (8.1), (8.2) and ( 8.3) hold, and the predicted value of
for the
calibration sample is given by (8.4). Again, the number of scores
should be chosen such that further scores are extracted only as long as each
new variable contributes significantly to the description of
.
A deeper understanding of the PLS2 method may be obtained by studying the
equations governing the solution obtained by the algorithm after convergence
of Steps 1-6. Applying Steps 3, 4, 5 and 1 in turn, the following sequence
of identities is obtained:
say, where
. This shows that
is an eigenvector
for the matrix
. Similar arguments show that
These eigenvectors are found by the algorithm in much the same way that the
eigenvectors of
are found by the
NIPALS algorithm.
Prediction for the PLS2 method is quite similar to the case of PLS1, as long
as we take the extra elements of the PLS2 algorithm into account. Consider,
as before, a new prediction sample
(
) and
predicted value
(
)
(both uncentered), and let
and
be the calibration sample averages.
In order to follow the steps of the PLS2 algorithm, now for the purpose of
prediction, we initialize by taking
and
. The prediction then proceeds
through the following steps:
- Let
- Let
.
- Stop if
; otherwise let
, and go back to Step 1.
Now form the row vector
,
and complete the prediction as follows:
It is also possible to write the prediction in matrix form (Bro, 1996, p.
69), as follows:
 ,
where the regression matrix
is
 .
Similar to what we did for PLS1, we decide on the number of component by
considering the proportion of the total
and
variance accounted for by the model.
The
variance accounted for by component
is
 ,
or, relatively, as a fraction of total
-variance
 .
The cumulative proportion of the total
-variance accounted
for by the model with
components is hence
The
-variance accounted for by the model with
components
is
The number of components
should be chosen such that the percentage
variation explained is large for both
and
.
|
 |