View/Print PDFPS
Module 8: Partial least squares regression II
By Bent Jørgensen and Yuri Goegebeur


Table of Contents





8.1 The PLS2 algorithm

Previous Section
Next Section

Play it again, Sam. [Woody Allen, 1972]

Kiss me. Kiss me as if it were the last time. [Ilsa Lund Laszlo, Casablanca, 1942]

We now consider the PLS2 algorithm. We assume that $ \boldsymbol{X}$ and $ \boldsymbol{\ Y}$ are centered data matrices. As already mentioned, one may use PLS1 separately for each analyte ( $ \boldsymbol{Y}$ -column), which allows a separate optimal model to be constructed for each analyte. It may, however, be advantageous to include information from other analytes when predicting any specific analyte. This may be done by constructing an overall model describing $ \boldsymbol{Y}$ as a function of $ \boldsymbol{X}$ , and for this purpose we may use the PLS2 method.

When several analytes are to be predicted simultaneously, the situation becomes more complicated than for the PLS1 algorithm. Suffice it to say that separate application of the PLS1 algorithm to each column of $ \boldsymbol{Y}$ would lead to different sets of scores being formed for each $ \boldsymbol{Y}$ -column. In PLS2, these separate scores are in effect reconciled into a single set of scores, but this extra constraint implies a more complex algorithm. Note that such a complication does not arise in connection with PCR, because PCR does not take $ \boldsymbol{Y}$ into account when forming the scores.

The principle behind the PLS2 algorithm may be outlined as follows. Similar to PLS1, we form a model for $ \boldsymbol{X}$ as in (7.3), namely

$\displaystyle \boldsymbol{X}=\boldsymbol{TP}^{\top }+\boldsymbol{X}_{g+1}$ (8.1)

when $ g$ scores are to be used. The scores (columns of $ \boldsymbol{T}$ ) is the single set of scores alluded to above. But now we form a similar model for $ \boldsymbol{Y}$ , namely

$\displaystyle \boldsymbol{Y}=\boldsymbol{UQ}^{\top }+\boldsymbol{Y}_{g+1}.$ (8.2)

This includes a second set of scores for $ \boldsymbol{Y}$ , namely the columns of $ \boldsymbol{U}$ . These two equations are linked by an inner relationship,

$\displaystyle \boldsymbol{U}=\boldsymbol{TC}+\boldsymbol{U}_{g+1},$ (8.3)

meaning a relationship that holds between latent, rather than observed variables. The two matrices $ \boldsymbol{U}$ and $ \boldsymbol{T}$ are both $ n\times g$ , and $ \boldsymbol{T}$ has orthogonal columns. $ \boldsymbol{P}$ is a $ k\times g$ matrix, $ \boldsymbol{Q}$ is an $ m\times g$ matrix whose columns are unit vectors, and $ \boldsymbol{C}$ is a $ g\times g$ diagonal matrix of regression coefficients. Similar to PLS1, we will also need the $ k\times g$ orthogonal matrix $ \boldsymbol{W}$ .

The three `error' terms $ \boldsymbol{X}_{g+1}$ $ \boldsymbol{Y}_{g+1}$ and $ \boldsymbol{U}_{g+1}$ are supposed to represent noise. Hence $ g$ should be chosen large enough to make the term $ \boldsymbol{X}_{g+1}$ useless for predicting $ \boldsymbol{Y}_{g+1}$ ; in other words, $ \boldsymbol{X}_{g+1}$ and $ \boldsymbol{Y}_{g+1}$ should be approximately uncorrelated. The relations (8.1), (8.2) and (8.3) are not enough, in themselves, to define the method, which will again be defined by the actual algorithm.

Ignoring the error terms in (8.2) and (8.3), and using the estimated value of $ \boldsymbol{C}$ , we obtain the predicted value of $ \boldsymbol{Y}$ as follows:

$\displaystyle \widehat{\boldsymbol{Y}}=\boldsymbol{T}\widehat{\boldsymbol{C}}\boldsymbol{Q}^{\top }.$ (8.4)

As before, $ \boldsymbol{Y}$ is predicted by means of the $ \boldsymbol{X}$ -scores in $ \boldsymbol{T}$ , but the $ \boldsymbol{Y}$ -loadings $ \boldsymbol{Q}$ now enter the prediction equation as well.

Since the PLS2 algorithm is based on the covariance between $ \boldsymbol{X}$ and $ \boldsymbol{Y}$ , a $ \boldsymbol{Y}$ -column with large elements relative to the remaining columns will receive a correspondingly larger weight in the algorithm. Since the $ \boldsymbol{Y}$ -columns normally represent different analytes, that may be measured in different scales, it is advisable to scale the columns of $ \boldsymbol{Y}$ properly, for example by autoscaling. Similar remarks apply in principle to $ \boldsymbol{X}$ , but usually $ \boldsymbol{X}$ is a spectral matrix, where autoscaling is not necessary.




8.1.1 The steps of the PLS2 algorithm

Previous Section
Next Section

After this overture, we now proceed to describe the actual PLS2 algorithm, which, like the NIPALS algorithm, is iterative, rather than just recursive. The algorithm starts with the initialization $ j=1$ , $ \boldsymbol{X}_{1}\boldsymbol{=X}$ and $ \boldsymbol{Y}_{1}\boldsymbol{=Y}$ , and then proceeds through the following steps to find the first $ g$ terms:

  1. The vector $ \boldsymbol{u}_{j}$ is initialized to be an arbitrary column of $ \boldsymbol{Y}_{j}$ .

  2. Let $ \boldsymbol{w}_{j}=\boldsymbol{X}_{j}^{\top }\boldsymbol{u}_{j}/\left\Vert \boldsymbol{X}_{j}^{\top }\boldsymbol{u}_{j}\right\Vert $ .

  3. Let $ \boldsymbol{t}_{j}=\boldsymbol{X}_{j}\boldsymbol{w}_{j}$ .

  4. Let $ \boldsymbol{q}_{j}=\boldsymbol{Y}_{j}^{\top }\boldsymbol{t}_{j}/\left\Vert \boldsymbol{Y}_{j}^{\top }\boldsymbol{t}_{j}\right\Vert .$

  5. Let $ \boldsymbol{u}_{j}=\boldsymbol{Y}_{j}\boldsymbol{q}_{j}$ .

  6. If $ \boldsymbol{u}_{j}$ is unchanged continue with Step 7; otherwise go back to Step 2.

  7. Let $ \widehat{c}_{j}=\boldsymbol{t}_{j}^{\top }\boldsymbol{u}_{j}\boldsymbol{/t}_{j}^{\top }\boldsymbol{t}_{j}$ .

  8. Let $ \boldsymbol{p}_{j}=\boldsymbol{X}_{j}^{\top }\boldsymbol{t}_{j}/\boldsymbol{t}_{j}^{\top }\boldsymbol{t}_{j}$ .

  9. Let $ \boldsymbol{X}_{j+1}\boldsymbol{=X}_{j}-\boldsymbol{t}_{j}\boldsymbol{p}_{j}^{\top }$ and $ \boldsymbol{Y}_{j+1}\boldsymbol{=Y}_{j}-\widehat{c}_{j}\boldsymbol{t}_{j}\boldsymbol{q}_{j}^{\top }$ .

  10. Stop if $ j=g$ ; otherwise let $ j=j+1$ and return to Step 1.

Now form the matrices $ \boldsymbol{W}$ , $ \boldsymbol{T}$ , $ \boldsymbol{Q}$ , $ \boldsymbol{U}$ and $ \boldsymbol{P}$ with columns $ \boldsymbol{w}_{j}$ , $ \boldsymbol{t}_{j}$ , $ \boldsymbol{q}_{j}$ , $ \boldsymbol{u}_{j}$ and $ \boldsymbol{p}_{j}$ , respectively, and form the $ g\times g$ diagonal coefficient matrix $ \widehat{\boldsymbol{C}}$ with diagonal elements $ \widehat{c}_{j}$ . After $ g$ runs through the algorithm, the relations (8.1), (8.2), (8.3) and (8.4) are satisfied.

In the special case $ m=1$ , PLS2 reduces to the PLS1, because then $ \boldsymbol{q}_{j}$ in Step 4 is 1, and $ \boldsymbol{u}_{j}=\boldsymbol{y}_{j}$ in Step 5.




8.1.2 Comments on the PLS2 algorithm

Previous Section
Next Section

We now comment on the main steps of the PLS2 algorithm. We consider run number $ j$ , and assume that convergence has been achieved in Step 6, so that all equations are satisfied.

Step 2. This step is similar to Step 1 of the PLS1 algorithm. But now the weights are based on the covariance between $ \boldsymbol{X}_{j}$ and $ \boldsymbol{u}_{j}$ , rather than between $ \boldsymbol{X}_{j}$ and the single column $ \boldsymbol{y}_{j}$ . In this way, $ \boldsymbol{u}_{j}$ is used as a representative of $ \boldsymbol{Y}_{j}$ , a compromise that is necessary in order to reconcile the information coming from the different columns of $ \boldsymbol{Y}_{j}.$

Step 3. The score vector $ \boldsymbol{t}_{j}$ is formed as a linear combination of the columns of $ \boldsymbol{X}_{j}$ with weights from $ \boldsymbol{w}_{j}$ , just as we did in the PLS1 algorithm.

Step 4. The weights $ \boldsymbol{q}_{j}$ are calculated from the covariances between $ \boldsymbol{Y}_{j}$ and $ \boldsymbol{t}_{j}$ in much the same way that $ \boldsymbol{w}_{j}$ is calculated from the covariances between $ \boldsymbol{X}_{j}$ and $ \boldsymbol{u}_{j}.$

Step 5. The score vector $ \boldsymbol{u}_{j}$ is formed as a linear combination of the columns of $ \boldsymbol{Y}_{j}$ with weights from $ \boldsymbol{q}_{j}$ ; a parallel with Step 3.

Step 7. The regression coefficient $ \widehat{c}_{j}$ is calculated by ordinary linear regression of $ \boldsymbol{u}_{j}$ on $ \boldsymbol{t}_{j}$ .

Step 8. The vector $ \boldsymbol{p}_{j}^{\top }$ is the vector of regression coefficients obtained from multiple linear regression of $ \boldsymbol{X}_{j}$ on $ \boldsymbol{t}_{j}$ .

Step 9. The vector $ \boldsymbol{X}_{j+1}\boldsymbol{=X}_{j}-\boldsymbol{t}_{j}\boldsymbol{p}_{j}^{\top }$ represents the residuals after regressing $ \boldsymbol{X}_{j}$ on $ \boldsymbol{t}_{j}$ , and correspondingly $ \boldsymbol{Y}_{j+1}\boldsymbol{=Y}_{j}-\widehat{c}_{j}\boldsymbol{t}_{j}\boldsymbol{q}_{j}^{\top }$ are residuals after regressing $ \boldsymbol{Y}_{j}$ on $ \boldsymbol{u}_{j}$ and taking the form (8.3) of $ \boldsymbol{U}$ into account. This step ensures that the $ \boldsymbol{t}_{j}$ -vectors become orthogonal.

After completing the $ g$ runs, the relations (8.1), (8.2) and ( 8.3) hold, and the predicted value of $ \boldsymbol{Y}$ for the calibration sample is given by (8.4). Again, the number of scores $ g$ should be chosen such that further scores are extracted only as long as each new variable contributes significantly to the description of $ \boldsymbol{Y}$ .

A deeper understanding of the PLS2 method may be obtained by studying the equations governing the solution obtained by the algorithm after convergence of Steps 1-6. Applying Steps 3, 4, 5 and 1 in turn, the following sequence of identities is obtained:

$\displaystyle \boldsymbol{X}_{j}^{\top }\boldsymbol{Y}_{j}\boldsymbol{Y}_{j}^{\top }\boldsymbol{X}_{j}\boldsymbol{w}_{j}$ $\displaystyle =$ $\displaystyle \boldsymbol{X}_{j}^{\top }\boldsymbol{Y}_{j}\boldsymbol{Y}_{j}^{\top }\boldsymbol{t}_{j}$  
  $\displaystyle =$ $\displaystyle \boldsymbol{X}_{j}^{\top }\boldsymbol{Y}_{j}\boldsymbol{q}_{j}\left\Vert
\boldsymbol{Y}_{j}^{\top }\boldsymbol{t}_{j}\right\Vert$  
  $\displaystyle =$ $\displaystyle \boldsymbol{X}_{j}^{\top }\boldsymbol{u}_{j}\left\Vert \boldsymbol{Y}_{j}^{\top }\boldsymbol{t}_{j}\right\Vert$  
  $\displaystyle =$ $\displaystyle \boldsymbol{w}_{j}\left\Vert \boldsymbol{Y}_{j}^{\top }\boldsymbo...
...}\right\Vert \left\Vert \boldsymbol{X}_{j}^{\top }\boldsymbol{u}_{j}\right\Vert$  
  $\displaystyle =$ $\displaystyle \lambda _{j}\boldsymbol{w}_{j},$  

say, where $ \lambda _{j}=\left\Vert \boldsymbol{Y}_{j}^{\top }\boldsymbol{t}_{j}\right\Vert \left\Vert \boldsymbol{X}_{j}^{\top }\boldsymbol{u}_{j}\right\Vert $ . This shows that $ \boldsymbol{w}_{j}$ is an eigenvector for the matrix $ \boldsymbol{X}_{j}^{\top }\boldsymbol{Y}_{j}\boldsymbol{Y}_{j}^{\top }\boldsymbol{X}_{j}$ . Similar arguments show that

  • $ \boldsymbol{t}_{j}$ is an eigenvector of the matrix $ \boldsymbol{X}_{j}\boldsymbol{X}_{j}^{\top }\boldsymbol{Y}_{j}\boldsymbol{Y}_{j}^{\top }$ ;

  • $ \boldsymbol{q}_{j}$ is an eigenvector of the matrix $ \boldsymbol{Y}_{j}^{\top }\boldsymbol{X}_{j}\boldsymbol{X}_{j}^{\top }\boldsymbol{Y}_{j}$ ;

  • $ \boldsymbol{u}_{j}$ is an eigenvector of the matrix $ \boldsymbol{Y}_{j}\boldsymbol{Y}_{j}^{\top }\boldsymbol{X}_{j}\boldsymbol{X}_{j}^{\top }$ .

These eigenvectors are found by the algorithm in much the same way that the eigenvectors of $ \boldsymbol{X}^{\top }\boldsymbol{X}$ are found by the NIPALS algorithm.




8.2 Prediction for PLS2

Previous Section
Next Section

Prediction for the PLS2 method is quite similar to the case of PLS1, as long as we take the extra elements of the PLS2 algorithm into account. Consider, as before, a new prediction sample $ \boldsymbol{z}$ ($ 1\times k$ ) and predicted value $ \widehat{\boldsymbol{y}}(\boldsymbol{z})$ ($ 1\times m$ ) (both uncentered), and let $ \overline{\boldsymbol{x}}$ and $ \overline{\boldsymbol{y}}$ be the calibration sample averages.

In order to follow the steps of the PLS2 algorithm, now for the purpose of prediction, we initialize by taking $ j=1$ and $ \boldsymbol{x}_{1}=\boldsymbol{z}-\overline{\boldsymbol{x}}$ . The prediction then proceeds through the following steps:

  1. Let $ t_{j}=\boldsymbol{x}_{j}\boldsymbol{w}_{j}.$

  2. Let $ \boldsymbol{x}_{j+1}\boldsymbol{=x}_{j}-t_{j}\boldsymbol{p}_{j}^{\top }$ .

  3. Stop if $ j=g$ ; otherwise let $ j=j+1$ , and go back to Step 1.

Now form the row vector $ \widehat{\boldsymbol{t}}=(t_{1},\ldots ,t_{g})$ , and complete the prediction as follows:

$\displaystyle \widehat{\boldsymbol{y}}(\boldsymbol{z})=\overline{\boldsymbol{y}}+\widehat{\boldsymbol{t}}\widehat{\boldsymbol{C}}\boldsymbol{Q}^{\top }.
$

It is also possible to write the prediction in matrix form (Bro, 1996, p. 69), as follows:

$\displaystyle \widehat{\boldsymbol{y}}(\boldsymbol{z})=\overline{\boldsymbol{y}}+\left(
\boldsymbol{z}-\overline{\boldsymbol{x}}\right) \widehat{\boldsymbol{B}}$,

where the regression matrix $ \widehat{\boldsymbol{B}}$ is

$\displaystyle \widehat{\boldsymbol{B}}=\boldsymbol{W}\left( \boldsymbol{P}^{\top }\boldsymbol{W}\right) ^{-1}\widehat{\boldsymbol{C}}\boldsymbol{Q}^{\top }$.




8.3 Selection of the number of components

Previous Section
Next Section

Similar to what we did for PLS1, we decide on the number of component by considering the proportion of the total $ \boldsymbol{X}$ and $ \boldsymbol{Y}$ variance accounted for by the model.

The $ \boldsymbol{X}$ variance accounted for by component $ j$ is

$\displaystyle \boldsymbol{t}_{j}^{\top }\boldsymbol{t}_{j}\boldsymbol{p}_{j}^{\top }\boldsymbol{p}_{j}$,

or, relatively, as a fraction of total $ \boldsymbol{X}$ -variance

$\displaystyle \frac{\boldsymbol{t}_{j}^{\top }\boldsymbol{t}_{j}\boldsymbol{p}_{j}^{\top }\boldsymbol{p}_{j}}{\mathrm{tr}\;\boldsymbol{X}^{\top }\boldsymbol{X}}$.

The cumulative proportion of the total $ \boldsymbol{X}$ -variance accounted for by the model with $ g$ components is hence

$\displaystyle \frac{\sum_{j=1}^{g}\boldsymbol{t}_{j}^{\top }\boldsymbol{t}_{j}\...
...{\top }\boldsymbol{p}_{j}}{\mathrm{tr}\;\boldsymbol{X}^{\top }\boldsymbol{X}}.
$

The $ \boldsymbol{Y}$ -variance accounted for by the model with $ g$ components is

$\displaystyle 1-\frac{\mathrm{tr}\;(\boldsymbol{Y}-\widehat{\boldsymbol{Y}})^{\...
...\widehat{\boldsymbol{Y}})}{\mathrm{tr}\;\boldsymbol{Y}^{\top }\boldsymbol{Y}}.
$

The number of components $ g$ should be chosen such that the percentage variation explained is large for both $ \boldsymbol{X}$ and $ \boldsymbol{Y}$ .

HOME | Back

Last modified January 29, 2007. Webmaster
©2001-2005 Master Of Applied Statistics