 |
Table of Contents
Rick: I congratulate you. Victor Laszlo: What for? Rick: Your work. Victor:
I try. Rick: We all try. You succeed. [Casablanca, 1942]
The PCR method from the previous module represents a considerable
improvement over MLR and CLS. By using latent variables (scores), it is
possible to use a large number of variables (frequencies), just as in CLS,
but without having to know about all interferences.
Problems may arise, however, if there is a lot of variation in
that is not due to the analyte as such. PCR finds, somewhat uncritically,
those latent variables that describe as much as possible of the variation in
. But sometimes the analyte itself gives rise to only small
variations in
, and if the interferences vary a lot, then
the latent variables found by PCR may not be particularly good at describing
. In the worst case important information may be hidden in
directions in the
-space that PCR interprets as noise, and
therefore leaves out.
Partial Least Squares Regression (PLS) is able to cope better with this
problem, by forming variables that are relevant for describing
. See Examples for a motivating example.
We now consider the general form of the PLS1 algorithm. We assume that
is an
centered data matrix and
an
centered data vector. The so-called PLS2 algorithm considered
in Module 8 may be used for the case of more than one column in
. The PLS2 algorithm, however, is more complicated than PLS1 and even
when several columns are available in
, it may be preferable
to apply PLS1 separately to each column of
. On the other
hand, PLS2 may be better for initial, more exploratory investigations, or in
cases where the different analytes show covariation.
The PLS1 algorithm starts with the initialization
,
and
. The algorithm then
proceeds through the following steps to find the first
latent variables:
- Let
.
- Let
.
- Let
.
- Let
.
- Let
and
.
- Stop if
; otherwise let
and return to Step 1.
Now form the two
matrices
and
and
matrix
with columns
,
and
, respectively, and form a
column vector
(
) with elements
. Let
and
which are the predicted values of
and
,
respectively. The matrix
is orthogonal, and
has orthogonal columns.
The PLS1 algorithm is used here in order to define the method, although
there are alternative ways of organizing the computations; see Bro (1996, p.
57). Note that, in spite of the similarities with the NIPALS algorithm, the
PLS1 algorithm is recursive and requires exactly
steps, whereas the
NIPALS algorithm is iterative, the number of iterations cannot be determined
in advance, and is dependent on the choice of a stopping criterion. In this
sense, the PLS1 algorithm is simpler than the NIPALS algorithm.
We now comment on each step of the PLS1 algorithm in turn. For simplicity,
we explain the first run of the algorithm (
), and then go on to explain
the general case.
Step 1. In PLS, we seek the direction in the space of
, which yields the biggest covariance between
and
. This direction is given by a unit vector
,
and is such that large variations in
-values are accompanied by large
variations in the corresponding
-values. The unit vector
is thus formed by standardizing the covariance matrix
for
and
. A further interpretation of
is that its transpose
is
proportional to the CLS regression coefficient
It may hence be useful, for diagnostic purposes, to compare
with any prior knowledge about the spectrum of the pure analyte,
although possible interferences may obscure this picture.
Step 2. The
score vector
is formed
as a linear combination of the columns of
with weights
. As explained above, the relative weights are given by
the covariances between
and each of the columns of
, and
may hence be understood as the
best linear combination of the columns of
for the purpose
of predicting
. The latent vectors
are
also called scores, similar to the terminology for PCA.
Step 3. The regression coefficient
is calculated
by ordinary linear regression of
on
.
Step 4. The
vector
is the
transpose of the vector of regression coefficients obtained from simple
linear regressions of the columns of
on
.
Step 5. The
vector
represents the residuals after
regressing
on
, and correspondingly,
are the
residuals after regressing
on
. This
step ensures that the
-vectors become orthogonal (just
as the corresponding
are in PCR), and thus ensures that
the multiple regression of
on
can be
calculated one column at a time, as done in Step 3.
After the first run through Steps 1-5, the procedure is repeated using the
residuals
and
. The algorithm then
finds the best linear combination of the columns of
for
the purpose of predicting
, thus picking up any further
structure in the connection between
and
not accounted for by
. This is repeated on and on, such
that each run of the algorithm in principle reveals more and more
information about the connection between
and
. Just as for PCR the information accounted for by each step usually
becomes less and less for each step taken.
After the
runs have been completed, the following relations hold:
The number of scores
should hence, in principle, be chosen such that
contains no further information about
, or in other words, such that
and
are approximately uncorrelated with each other. In the
extreme case where
becomes
zero, the algorithm is stopped prematurely. In summary, further scores
should be extracted only as long as each new variable contributes
significantly to the description of
. Criteria for deciding
when this is the case will be discussed later, in Module 13.
Prediction for the PLS1 method is slightly more complicated, than for PCR,
in spite of the algorithm being simpler. Consider a new prediction sample
(
vector) and predicted value
(both uncentered). Note the new notation for the predicted
value. Let
(
) and
be
the calibration sample averages. The prediction is performed by essentially
retracing the steps of the algorithm, letting the row vector
follow the same steps as a row of the
matrix.
Let
,
,
and
be the matrices and vector formed after applying the PLS1
algorithm to the calibration data. Initialize by taking
and
. Then proceed
through the following steps:
- Let
- Let
.
- Let
, and repeat Steps 1 to 3 until
.
Now form the row vector
,
and complete the prediction as follows:
 .
It is possible, though, to summarize the prediction in a matrix formula
(Bro, 1996, pp. 62-63), as follows:
where
, the so-called regression vector, is
There is, however, a slight disadvantage to this method, because useful
information contained in the individual latent variables
is not available here. Like the regression matrix in the previous methods,
the regression vector
contains useful information
about which areas (frequencies) contribute to the prediction.
|
 |