 |
Table of Contents
You're saying this only to make me go. [Ilsa Lund Laszlo, Casablanca,
1942]
In this module we turn to the Principal Components Regression (PCR) method,
in which the PCA (Principal Components Analysis) method from the previous
module is put to work in regression. To this end we consider the principal
components of
, where
is a centered
data matrix.
There are several ways of finding the principal components of the
matrix. One possibility is to apply
the SVD method to
, writing the reduced form of SVD as
follows:
where
(
) and
(
) are
orthogonal matrices corresponding to
singular values, in the notation of
Module 5.
Let the scores matrix be defined by
a matrix with orthogonal, but not necessarily orthonormal columns. In fact
where
contains the non-zero eigenvalues of
in its diagonal. We assume that the eigenvalues are
in decreasing order,
.
Since
 |
(6.1) |
we find that
which is the spectral decomposition for
, except that columns of
corresponding to zero eigenvalues
have been left out. By using that
is orthogonal, we may
also write (6.1) as follows:
, |
(6.2) |
which follows by noting that
. Recall from Module 5 that the columns of
are known as scores, and those of
as loadings.
Now we consider the NIPALS (Nonlinear Iterative Partial Least Squares)
algorithm for finding the principal components of
. We want to find the first
principal components of
, starting with the largest eigenvalue
and down.
must be less than or equal to
The NIPALS algorithm starts with the initialization
and
. The algorithm then iterates through the following
steps:
- Choose
as any column of
.
- Let
- Let
- If
is unchanged continue; otherwise return to
Step 2.
- Let
.
- Stop if
; otherwise let
and return to Step 1.
Assume first that
, so we have found all the principal components. Now
form the matrices
and
with columns
and
, respectively; these matrices
now satisfy (6.1).
It is possible to modify the NIPALS algorithm to take missing data into
account, see Bro (1996), pp. 43-44.
Let us consider some properties of the NIPALS algorithm, which also help
understand the PCA method.
That the NIPALS algorithm gives PCA may be seen as follows. Let
and
write Step 2 as follows:
Now insert
from Step 3, giving
 |
(6.3) |
This equation is satisfied upon convergence of the loop 2-4. This shows
that
and
are an eigenvalue and
eigenvector of
, respectively. Also
note that using
and (6.3)
we obtain
where in the last step we have used the fact that
is a
unit vector (see Step 2).
After the first run through the loop 1-5, Step 5 with
gives that
 |
(6.5) |
Let us show that
and
are orthogonal. In fact
as seen from (6.3) with
. Since
was
initially picked as a column of
, it is hence orthogonal
to
and remains so to the end of the loop.
After the second run through the loop 1-5, we obtain
After
runs through the loop, we similarly have
 |
(6.6) |
where
in the case
(compare with (6.1)).
As the example suggests, the essence of the PCA method is to decompose the
matrix as in (6.6),
say, where
and
contain the first
columns of
and
, respectively. We want
to choose
in such a way that
is small and
represents only noise, while the term
represents the salient features of
. In order
to accomplish this,
must be chosen in such a way that the
terms
that are ignored correspond to zero or negligible eigenvalues.
In order to help rationalize the choice of
, the relative size of the
eigenvalues are expressed as a percentage of the sum of all eigenvalues,
and this percentage is interpreted as the percent variation explained
by the corresponding principal component. Often, the cumulated percentages
are used, so that the percent variation explained by the first
components is
As a rule,
should be chosen so that at least about 80-90 percent of
the variation is explained.
The justification for the above terminology is that the variance of the
score vector
is
so that
is proportional to the variance of the corresponding
score. In particular, all components with
should be left
out. Also, since the covariance matrix of
is
we may interpret
as the contribution of
to the total (co-)variance for
. Note also that the sum of
the eigenvalues is equal to
which is interpreted as the total variance in
.
The basic idea in Principal Components Regression (PCR) is that after
choosing a suitable value for
in (6.7), the important features
of
have been retained by
. We then
perform the MLR with
in place of
for
an
calibration data matrix
,
 |
(6.8) |
The least squares method then gives
where
, being diagonal, is
easy to invert. The fact that we have left out the loadings matrix
in (6.8) is of no consequence for prediction,
because the scores
are linear combinations of the
columns of
, and the PCR method amounts to singling out
those linear combinations that are best for predicting
.
For prediction with PCR, it is necessary to turn to
again,
and using (6.2) we may write the regression equation as follows:
Consider a new sample spectrum
and predicted value
(both uncentered), and let
and
be the calibration sample averages.
Then the prediction takes the form
The matrix
is called the regression matrix, and may be compared with the
matrix of MLR.
Just as in MLR, the same prediction would be obtained if the columns of
were considered separately. The fact that
appears in the PCR Equation (6.9) may be seen as compensating for
the fact that we left out
in (6.8). When
comparing with MLR, the role of the
matrix is now played by
.
In the case where
has rank
and
, the two methods
will give identical results. When
, and still
has rank
, the results of PCR may differ somewhat from those of MLR, depending on
the number and sizes of the components left out.
However, PCR has some major advantages over MLR, in that
may be singular, and the case
may be dealt with. Note, however,
that in the latter case
, and so at
most
components may be included.
|
 |