View/Print PDFPS
Module 9: Optimizing your model
By Bent Jørgensen and Yuri Goegebeur


Table of Contents





9.1 Choice of calibration method

Previous Section
Next Section

Capt. Renault: What in heaven's name brought you to Casablanca? Rick: My health. I came to Casablanca for the waters. Capt. Renault: The waters? What waters? We're in the desert. Rick: I was misinformed. [Casablanca, 1942]

The topic of optimization concerns a set of techniques designed to ensure that you get the best out of your model and data, and that you avoid any pitfalls on the way. The techniques allow you to take a critical look at your model in order to spot anything that might lead to incorrect or misleading results. We first discuss the choice of calibration method, and then move on to discuss outlier detection, and the use of graphical displays.

In the previous modules, we have considered several competing calibration methods, and discussed some of their advantages and disadvantages. We now review the main considerations regarding the choice of calibration method. Assume that the calibration data consist of centered data matrices $ \boldsymbol{X}$ and $ \boldsymbol{Y}$ , of dimensions $ n\times k$ and $ n\times
m$ , respectively. Let us consider the four methods discussed until now:

  • MLR--Multiple Linear Regression.

  • CLS--Classical Least Squares.

  • PCR--Principal Components Regression.

  • PLS--Partial Least Squares.

First a note about linearity. As we have seen, prediction takes the following linear form:

$\displaystyle \widehat{\boldsymbol{y}}=\overline{\boldsymbol{y}}+\left( \boldsymbol{z}-\overline{\boldsymbol{x}}\right) \widehat{\boldsymbol{b}}$,

where $ \widehat{\boldsymbol{b}}$ is the so-called regression matrix. In effect, the only difference between the four methods is the way in which the regression matrix $ \widehat{\boldsymbol{b}}$ is calculated. In this sense, all four methods are linear, and they are not suitable for genuinely nonlinear problems. We comment below on nonlinearity and some other problems that may occur with calibration models.

There are two particular problems often met in calibration, namely interference and matrix effects. Interference is when either other signal-emitting elements present in the sample or physical conditions in the equipment influence the results for the compound(s) of interest. Matrix effects are interferences originating in conditions or compounds in the sample, such as temperature or pH, that do not emit signals as such, but may change the actual signal measured for the compound(s) of interest.




9.1.1 Few $ x$ -variables

Previous Section
Next Section

The case of few $ x$ -variables is when $ k<n$ , that is, there are fewer $ x$ -variables than calibration samples. Under this heading we consider MLR, which is the only method among the four that limits the number of $ x$ -variables in this way. This may be a severe restriction in in a chemometric setting, but it is important to recognize that statistically, a good prediction method requires a large number of calibration samples ($ n$ large), and no amount of $ x$ -variables can compensate for too few calibration samples.

Now let us summarize some of the pros and cons of MLR.

  • Pros of MLR:

    • Copes well with interferences.

    • Copes well with matrix effects.

  • Cons of MLR:

    • Requires $ k<n$ .

    • Does not cope well with nonlinearity.

    • Does not cope well with collinearity (see below).

    • Too many $ x$ -variables give less precise predictions.




9.1.2 Many $ x$ -variables

Previous Section
Next Section

The case of many $ x$ -variables refers to the case $ k\geq n$ , that is, there are at least as many $ x$ -variables as calibrations samples, which is often the case in chemometrics. This case may in principle be handled by any of the three methods CLS, PCR and PLS. We now summarize some of the pros and cons of these methods.

  • General pros of CLS, PCR and PLS:

    • Eliminate noise in $ x$ to a certain degree.

CLS

  • Pros of CLS:

    • Simple chemical interpretation (Beer-Lambert's law).

  • Cons of CLS:

    • Does not cope well with nonlinearity.

    • Does not cope well with interferences, unless they are known in advance.

    • Does not cope well with matrix effects, unless they are linear.

PCR and PLS

The PCR and PLS methods are based on the principle of using as few latent variables (scores) as possible, while at the same time including information from the whole spectrum (all $ x$ -variables). The difference with the CLS methods lies in the fact that the latter uses the whole spectrum directly, whereas PCR and PLS use only functions of the spectrum, namely the scores. PCR and PLS can never use more than $ n$ scores in the model, however.

  • General pros of PCR and PLS.

    • Cope with nonlinearities to some extent.

    • Cope well with interferences.

    • Cope well with collinearity.

  • Pros of PCR:

    • Simple expression for the prediction variance.

  • Cons of PCR:

    • $ \boldsymbol{Y}$ is not taken into account in the decomposition of $ \boldsymbol{X}$ .

  • Pros of PLS:

    • Takes both $ \boldsymbol{X}$ and $ \boldsymbol{Y}$ into account in the decomposition of $ \boldsymbol{X}$ .

  • Cons of PLS:

    • The prediction variance is difficult to calculate.

    • PLS is inappropriate if $ \boldsymbol{Y}$ has too much variation.




9.2 Calibration problems

Previous Section
Next Section

We now present some of the potential problems in calibration that have to be solved in order to get the best out of your data and model.

  1. The effect of random error.

    Calibration data always contain errors, either measurement errors or other forms of noise. There are many potential sources of errors, such as sample heterogeneity, thermal noise in electronic circuits, false light sources, uncontrolled interferences and so on. Some may be due to human mistake or hardware failure, such as incorrect labeling of test tubes, defective sensors, contamination of material, or samples that do not originate from the intended study population. Although some of these errors may actually be deterministic, we often assume that the errors are randomly distributed, because we may not understand or know about all of them, and cannot predict them.

  2. Outliers and robustness.

    Any sample or variable that somehow deviates from the majority is called an outlier. As the discussion of Item 1 suggests, random errors and outliers may to some extent have common origins, but pragmatically, outliers are defined as those errors that stand out, either visually on a plot, or according to a more specific criteria.

    Robustness, in the statistical sense, is a property of the statistical method, such that the results obtained using the method are to some extent insensitive to small deviations from the assumptions behind the method. Second, robustness also means that results ought not depend crucially on whether certain samples or variables are included in the calibration set or not, so that, in particular, results are not unduly distorted by one or more outliers in the data. Robustness is a desirable, indeed crucial, property of a good calibration method. Methods for detection and analysis of outliers, along with a discussion of robustness, will be considered below.

  3. Collinearity (sometimes called multicollinearity).

    Collinearity often arises for spectral data $ X$ , because absorbances for neighboring frequencies are correlated. It may also be due to other causes, in cases where the information carried by the $ X$ - or $ Y$ -block is smaller than the number of variables ($ k$ and $ m$ , respectively) might suggest. Both PCR and PLS deal well with collinearity in the $ X$ - and $ Y$ -blocks, but if collinearity in $ Y$ is revealed, one should look for the underlying cause, be it chemical or from other causes such as closed systems (where concentrations add up to a constant).

  4. Spanning the calibration space.

    If the chosen calibration samples do not fully represent the actual conditions under which the method is to be used, bad predictions may result. For example, if a calibration study is conducted with large concentrations of the analyte, whereas the actual conditions under which the method is to be used require small concentrations, the calibration may be misleading. External conditions should also be representative of the real conditions where the calibration model will be used. For example, if trial runs of a production are conducted during summer holidays, the method may fail in cold weather if the influence of temperature was not properly taken into account.

    To obtain a set of representative samples, proper use of experimental design should be coupled with sound chemical knowledge. The question of experimental design, although important, is not studied as such here. Later we present various graphical displays that can help determine if the calibration set may be considered representative for the set of conditions we want to describe.

  5. Nonlinearity.

    This problem is dealt with in the next section.

  6. Under- and overfitting.

    Under- and overfitting is mainly a question of selecting the correct number of scores. This problem will be treated in the next module.




9.3 Nonlinear models

Previous Section
Next Section

The theory of nonlinear models is beyond the scope of the present notes, but a few remarks about this topic are in order. The following considerations show that the scope of linear models is wider than may appear at first sight.




9.3.1 Linearization by Taylor-expansion

Previous Section
Next Section

Suppose that the relation between the spectrum $ \boldsymbol{x}$ and the vector of concentrations $ \boldsymbol{y}$ ($ k\times 1$ and $ m\times 1$ vectors, respectively) is given by a smooth function $ f:\boldsymbol{R}^{k}\boldsymbol{\rightarrow R}^{m}$ , possibly nonlinear, ignoring noise. Then a Taylor-expansion of $ f$ gives

$\displaystyle f(\boldsymbol{x})-f(\boldsymbol{x}_{0})\approx \nabla _{x}f(\boldsymbol{x}_{0})(\boldsymbol{x}-\boldsymbol{x}_{0}),$ (9.1)

where $ \nabla $ is the gradient operator. If we now have $ n$ calibration samples with $ \boldsymbol{x}$ -values $ \boldsymbol{x}_{1},\ldots ,\boldsymbol{x}_{n}$ , we may form the matrices

\begin{displaymath}
\boldsymbol{Y}=\left[
\begin{array}{c}
\left\{ f(\boldsymbo...
...{n})-f(\boldsymbol{x}_{0})\right\} ^{\top }\end{array}\right]
\end{displaymath}

($ n\times m$ ),

\begin{displaymath}
\boldsymbol{X}=\left[
\begin{array}{c}
(\boldsymbol{x}_{1}-...
...ldsymbol{x}_{n}-\boldsymbol{x}_{0})^{\top }\end{array}\right]
\end{displaymath}

($ n\times k$ ) and

$\displaystyle \boldsymbol{B}=\nabla _{x}f(\boldsymbol{x}_{0})^{\top }
$

($ k\times m$ ). Including the usual $ n\times m$ noise term $ \boldsymbol{F}$ , and absorbing the error in (9.1) into $ \boldsymbol{F}$ , we hence obtain the following linear model:

$\displaystyle \boldsymbol{\dot{Y}}=\boldsymbol{\dot{X}B+F}$.

Note that a small amount of nonlinearity (from the missing remainder term in (9.1)) may be absorbed into the noise term $ \boldsymbol{F}$ . In this sense, any nonlinear smooth model may be considered to be approximately linear. The centered data matrices $ \boldsymbol{\dot{X}}$ and $ \boldsymbol{\dot{Y}}$ do not involve $ \boldsymbol{x}_{0}$ or $ f(\boldsymbol{x}_{0})$ , and since the value of $ \boldsymbol{B}$ may be estimated from $ \boldsymbol{X}
$ and $ \boldsymbol{Y}$ , there is no need to specify the value of the expansion point $ \boldsymbol{x}_{0}$ explicitly, nor the actual form of $ f$ , in order for this argument to work in a practical setting. Since the amount of nonlinearity is always small for $ \boldsymbol{x}$ close enough to $ \boldsymbol{x}_{0}$ , any model may be considered approximately linear for ranges that are small compared with the noise level.

A good strategy for problems that may or may not be nonlinear is hence to attempt to use a linear model, and then to validate and optimize the model by means of the methods considered here and in the next module. If a well-fitting linear model is found in this way, then fine. If the correct model is truly nonlinear, and a wide enough range of samples are included in the calibration for the nonlinearity to reveal itself, then the nonlinearity must be taken into account.




9.3.2 Linearization by transformation

Previous Section
Next Section

An important point to keep in mind is that some nonlinear models may be linearized by transforming either $ \boldsymbol{x}$ , $ \boldsymbol{y}$ or both by means of a suitable nonlinear function. Let us give two examples of this. Consider the relation

$\displaystyle y=\alpha e^{\beta x}.
$

By taking logs on both sides of the equation and moving $ \log \alpha $ to the left-hand side of the equation, we obtain

$\displaystyle \log y-\log \alpha =\beta x,
$

a linear model in terms of the transformed variable $ \log y$ . Note in particular that any constant term, such as $ \log \alpha $ will disappear when $ \log y$ is centered. Such data may hence be analyzed by replacing $ y$ by $ \log y$ in the analysis. Similarly, if $ y$ is a product of powers, for example

$\displaystyle y=\alpha x_{1}^{\beta }x_{2}^{\gamma },
$

then taking logs and moving $ \log \alpha $ to the left-hand side gives

$\displaystyle \log y-\log \alpha =\beta \log x_{1}+\gamma \log x_{2}.
$

This model may hence be analyzed by replacing both $ y$ and the two $ x$ -variables by their logs.

Such linearization techniques are discussed in many books on regression; see for example Draper and Smith (1998) and Atkinson (1985). More generally, suitable transformations of $ x$ and/or $ y$ may reduce the amount of nonlinearity found in any given model, so to speak. The most common transformation is by far the log transformation for positive variables, as already illustrated. It is hence advisable to always consider the possibility of taking logs for positive variables, in order to see if nonlinearity may be eliminated or reduced. This should not be done blindly, however, so the suitability of any given transformation should always be verified by the validation and optimization methods discussed in this module.




9.4 Outlier detection and diagnostics

Previous Section
Next Section

Residual analysis consists of a set of diagnostic tools designed to detect outliers and other potential problems in a given calibration model.

We shall phrase the discussion of residual analysis in terms of the MLR method. We hence assume a model of the form

$\displaystyle \boldsymbol{y}=\boldsymbol{Xb}+\boldsymbol{f}$, (9.2)

where $ \boldsymbol{X}$ has dimension $ n\times k$ and rank $ k$ , and $ k<n$ . Assume that

$\displaystyle f_{1},\ldots ,f_{n}
$

are independent and

$\displaystyle \mathrm{E}(f_{i})=0$,    $\displaystyle \mathrm{Var}(f_{i})=\sigma ^{2}$ for $\displaystyle i=1,\ldots ,n$.

The $ \boldsymbol{X}$ -matrix is not assumed to be centered. The methods may, however, also be applied to PCR and PLS. First of all, methods developed for a single $ y$ -column may simply be applied to the columns of the $ Y$ -block one column at a time. Second, when a model for $ \boldsymbol{X}$ is used, for example

$\displaystyle \boldsymbol{X}=\boldsymbol{TP}^{\top }+\boldsymbol{X}_{g+1}$ (9.3)

(both PCR and PLS), then $ \boldsymbol{X}$ in (9.2) is replaced by the scores matrix $ \boldsymbol{T}$ from (9.3), letting $ k=g$ (recall that $ g$ must be less than $ n$ ). In this sense, the methods are universally applicable.

We shall not discuss the special problems of residual analysis encountered in connection with CLS.




9.4.1 Crude residuals

Previous Section
Next Section

  • Crude residuals

    $\displaystyle \widehat{f}_{i}=y_{i}-\widehat{y}_{i}
$

  • Variance estimator

    $\displaystyle \widehat{\sigma }^{2}=\frac{1}{n-k}\sum_{i=1}^{n}\widehat{f}_{i}^{2}
$

    Note that $ \mathrm{d.f.}=n-k$ is known as the degrees of freedom.




9.4.2 Hat matrix and leverage

Previous Section
Next Section

  • Hat matrix

    $\displaystyle \boldsymbol{H}=\boldsymbol{X(X}^{\top }\boldsymbol{X)}^{-1}\boldsymbol{X}^{\top }
$

  • Vector of fitted (predicted) values

    $\displaystyle \widehat{\boldsymbol{y}}=\boldsymbol{Hy,}
$

    so

    $\displaystyle \widehat{y}_{i}=\sum_{j=1}^{n}h_{ij}y_{j}
$

  • Interpretation: $ h_{ij}$ is the weight with which $ y_{j}$ enters $ \widehat{y}_{i}$

  • Define $ i$ th leverage

    $\displaystyle h_{i}=h_{ii},
$

    the weight with which $ y_{i}$ enters in $ \widehat{y}_{i}$ .

  • Properties

    $\displaystyle 0\leq h_{i}\leq 1
$

    and

    $\displaystyle h_{ij}^{2}\leq h_{i}
$

    $\displaystyle h_{ij}^{2}\leq h_{j}
$

  • Average value of $ h_{i}$ is $ k/n$ ,

    $\displaystyle \sum_{i=1}^{n}h_{i}=k
$

  • Large $ h_{i}$ is a problem if $ h_{i}>2k/n.$

  • Note

    $\displaystyle \mathrm{Var}(\widehat{y}_{i})=\sigma ^{2}h_{i}
$

    and

    $\displaystyle \mathrm{Var}(\widehat{f}_{i})=\sigma ^{2}\left( 1-h_{i}\right)
$

    (variances depend on $ i$ ).

  • Hence $ h_{i}$ near 1 implies large variance for $ \widehat{y}_{i}$ and small variance for $ \widehat{f}_{i}$ .




9.4.3 Standardized and Studentized residuals

Previous Section
Next Section

  • Define standardized residuals

    $\displaystyle \widetilde{f}_{i}=\frac{\widehat{f}_{i}}{\widehat{\sigma }\sqrt{1-h_{i}}}$;

    all have approximately variance 1.

  • $ \widetilde{f}_{i}$ de-emphasizes outliers, because $ \widehat{\sigma }^{2}$ also is large when $ \left\vert \widehat{f}_{i}\right\vert $ is large.

  • Define Studentized (cross-validation) residuals

    $\displaystyle \widehat{t}_{-i}=\frac{\widehat{f}_{i}}{\widehat{\sigma }_{-i}\sqrt{1-h_{i}}}
$

    where $ \widehat{\sigma }_{-i}^{2}$ is the estimate for $ \sigma ^{2}$ leaving out $ y_{i}$ ,

    $\displaystyle \widehat{\sigma }_{-i}^{2}=\frac{1}{n-k-1}\left\{ (n-k)\widehat{\sigma }^{2}-\frac{\widehat{f}_{i}^{2}}{1-h_{i}}\right\}
$

  • Studentized residuals emphasize outliers.

  • When no outliers are present, $ \widehat{t}_{-i}$ follows a $ t(n-k-1)$ distribution.

  • When $ n-k$ is large: outlier means either $ \left\vert \widetilde{f}_{i}\right\vert $ or $ \left\vert \widehat{t}_{-i}\right\vert $ (much) greater than 2.

  • If outliers come in pairs, triples or larger groups, the deletion of one data at a time does not change the fit much. This is called a masking effect.

  • It may hence be useful to delete data in pairs, triplets etc. and comparing with the fit based on the remaining data.




9.4.4 Cook's distance

Previous Section
Next Section

  • Define Cook's distance $ D_{i}$ by
    $\displaystyle D_{i}$ $\displaystyle =$ $\displaystyle \frac{1}{\widehat{\sigma }^{2}k}(\widehat{\boldsymbol{b}}_{-i}-\w...
...}^{\top }\boldsymbol{X}(\widehat{\boldsymbol{b}}_{-i}-\widehat{\boldsymbol{b}})$  
      $\displaystyle =$ $\displaystyle \frac{1}{\widehat{\sigma }^{2}k}(\widehat{\boldsymbol{y}}_{-i}-\w...
...oldsymbol{y}})^{\top }(\widehat{\boldsymbol{y}}_{-i}-\widehat{\boldsymbol{y}}),$  

    where $ \widehat{\boldsymbol{b}}_{-i}$ is the vector of estimates, leaving out $ y_{i}$ and where $ \widehat{\boldsymbol{y}}_{-i}$ is the vector of fitted values, leaving out $ y_{i}$ .

  • Cook's $ D_{i}$ weighs together $ \widetilde{f}_{i}$ and $ h_{i}$ , and we have the expression

    $\displaystyle D_{i}=\frac{\widetilde{f}_{i}^{2}}{k}\frac{h_{i}}{1-h_{i}}$.

    Hence $ D_{i}$ can help us decide if a large residual is actually influential, in the sense of having changed the fit considerably.




9.4.5 Properties of residuals and leverage

Previous Section
Next Section

Summary of residuals is shown in Table 9.1.

Table 9.1: Properties of residuals
  Hat Crude Standardized Studentized Cook's
Symbol $ h_{i}$ $ \widehat{f}_{i}$ $ \widetilde{f}_{i}$ $ \widehat{t}_{-i}$ $ D_{i}$
Mean $ \frac{k}{n}$ 0 0 0 --
Variance -- $ \sigma ^{2}(1-h_{i})$ 1 $ \approx 1$ --
Limits $ 0-2k/n$ ? $ \pm 2$ $ \pm 2$ 1
Usage Leverage ? Plots Outliers Influence


Summary of residual plots

  1. Plot $ \widetilde{f}_{i}$ against $ \widehat{y}_{i}$ , and check for trumpet form (transform $ y$ if necessary).

  2. Plot $ \widetilde{f}_{i}$ against $ x_{ij}$ for all $ j$ , and check for nonlinear form (transform $ \boldsymbol{x}_{j}$ if necessary).

  3. Plot $ \widehat{t}_{-i}^{2}$ against $ h_{i}$ , and check for points outside the limits for $ \widehat{t}_{-i}^{2}$ or $ h_{i}$ or both.

  4. Plot $ D_{i}$ against $ i$ , and look for $ D_{i}$ bigger than $ 1$ .

  5. A final model check is to plot $ \widehat{y}_{i}$ against $ y_{i}$ .




9.4.6 Robustness

Previous Section
Next Section

There are many robust methods available in statistics, such as for example robust regression, where outliers are downweighted in order to limit their potential influence on the results. The purpose of such 'automatic' robust methods is to make the results more stable and thus more reliable.

The same goal can, however, also be achieved by more simple-minded procedures, where outliers are detected by inspection, and then treated appropriately, which is the method chosen here. One should nevertheless be aware that manual outlier detection and handling is time-consuming and requires a certain amount of experience. Automatic outlier handling therefore also has its place in the analyst's tool bag, especially in connection with large-scale automated data handling.

When errors have occurred as a result of using samples from other populations than the intended one, or using samples with erroneous $ x$ - or $ y $ -values due to human or technical error, this may result in outliers. Robustness against such errors may be achieved by identifying the samples or variables in question and eliminating them from the calibration set. Besides eliminating the problems that would have been caused by such an outlier, this is a useful exercise in itself, because it leads one to identify weaknesses in procedures and techniques, and may hence lead one to take a critical view of the whole process under study. For this reason, manual outlier detection and handling is an indispensable tool, especially in the initial phases of a calibration project.

Robustness against measurement errors and abnormal distributions may be achieved by including many samples and variables in the calibration. In this way, each individual sample or measurement has less influence on the final result, and parameter estimation becomes less sensitive to random noise. Appropriate spanning of the calibration space, using methods of experimental design and common sense, serves the same purpose.

If, for economic reasons or otherwise, one is unable to include many samples in the calibration, one may in certain situations encounter problems. This may happen if variables are heteroscedastic (when the variance is a function of the mean, rather than constant) or non-normal (not from the normal distribution). In any case, the amount of information contained in a calibration sample is, generally speaking, proportional to the number of calibration samples. No amount of sophistication can substitute the lack of proper and relevant calibration samples. Guidance about the appropriate choice of the number of calibration samples needed in order to obtain a certain accuracy and robustness are part of the methods of experimental design. It may be useful to conduct a small pilot study under realistic circumstances in order to obtain information necessary for proper experimental design.




9.5 Graphical displays

Previous Section
Next Section

Graphical displays are very important in practical data analysis, and we have already used them in the data examples in the previous modules. We now review the main types of plots and their purposes.




9.5.1 General data displays

Previous Section
Next Section

Purpose: To obtain an overview of the full data set.

  1. Spectral plot (index plot for $ \boldsymbol{X}$ ). An index plot is a plot of the $ n$ profiles $ \left\{ (j,x_{ij}):j=1,\ldots ,k\right\} $ for $ i=1,\ldots ,n$ , usually taking the abscissa to be either frequency or wavelength.

  2. Composition plot (index plot for $ \boldsymbol{Y}$ ).

  3. Scatterplot (plot of a $ y$ -column versus an $ x$ -column). Useful only for small $ k$ and $ m$ , such as simple linear regression.




9.5.2 Latent variable plots

Previous Section
Next Section

Purpose: To summarize the information obtained from a PCR or PLS analysis, and detect problems, such as nonlinearity.

  1. Scree plot (index plot of eigenvalues or component variability).

  2. Plot of percent variance explained (cumulative plot of eigenvalues or component variability).

  3. Loading plots of the $ \boldsymbol{p}_{j}$ -vectors (PCR) or the $ \boldsymbol{w}_{j}$ -vectors (PLS), for example:

    • Index plots for each $ \boldsymbol{p}_{j}$ .

    • Scatterplot of $ \boldsymbol{p}_{2}$ versus $ \boldsymbol{p}_{1}$ .

  4. Score plots of the $ \boldsymbol{t}_{j}$ -vectors (PCR or PLS), for example:

    • Index plots for each $ \boldsymbol{t}_{j}$ .

    • Scatterplot of $ \boldsymbol{t}_{2}$ versus $ \boldsymbol{t}_{1}$ .

    • Scatterplot of each $ \boldsymbol{t}_{j}$ versus each $ \boldsymbol{y}$ -column.




9.5.3 Plots of residuals and fitted values

Previous Section
Next Section

Purpose: To detect outliers, nonlinearity, lack of fit, or other problems with the calibration model.

  • Residual plots; see the previous section for details. Residual plots for PCR and PLS should be based on the scores matrix $ T$ .

  • Fitted values plot (scatterplot of $ \widehat{y}$ versus $ y$ for each $ y
$ -variable, or $ \widehat{x}$ versus $ x$ for each $ x$ -variable). Useful for evaluating the quality of the fit of a particular model.

  • Plot of observed and fitted values, in the form of composition plots for $ \widehat{Y}$ and $ Y$ in the same plot. Can also be done for $ \boldsymbol{X}$ .

Bibliography

2
Atkinson, A.C. (1985). Plots, Transformations, and Regression. An Introduction to Graphical Methods of Diagnostic Regression Analysis. Oxford: Clarendon Press.

2
Draper, N.R. and Smith, H. (1998). Applied Regression Analysis (3rd Ed.). John Wiley & Sons, New York.

HOME | Back

Last modified January 29, 2007. Webmaster
©2001-2005 Master Of Applied Statistics