 |
Table of Contents
We assume from now on that the data matrices
and
are centered, unless otherwise stated. This will help
simplify the notation. For example,
is then proportional to the covariance matrix for
,
.
You must remember this.
A kiss is just a kiss.
A sigh is just a sigh.
The fundamental things apply.
As time goes by.
[Casablanca, 1942]
In this module we consider Principal Components Analysis (PCA). This is a
general method for analysis of multivariate data, which will be applied in
connection with the principal components regression in the next module. But
first we consider the singular value decomposition.
Recall the eigenvalue decomposition from Module 2. Thus let
be a square symmetric
matrix, and consider its eigenvalue
decomposition
Here
-
is an orthogonal matrix with columns
- We may assume that the eigenvalues are ordered:
When
is a centered data matrix, then
is the covariance matrix of
. The
method of studying the eigenvalue decomposition of the covariance matrix is
known as Principal Components Analysis (PCA). We consider the use of
PCA as a multivariate data analysis tool below.
A second connection between the eigenvalue decomposition and SVD comes if we
apply the SVD method directly to
itself. We consider two
cases. If
is positive semi-definite, then its eigenvalue
decomposition
 |
(5.2) |
is the same as the SVD for
. In that case the matrices of
the SVD for
are
and the
of SVD and eigenvalue decomposition are the same.
The eigenvalues and singular values for
are the same, and
we normally arrange them in decreasing order,
Now, the second case is if
is not positive semi-definite.
In this case, a simple rearrangement of (5.2) leads to the SVD for
. Thus, for each negative eigenvalue
, we
replace
in
by
to form
, and form
from
by
multiplying the corresponding columns of
by
In both cases we may easily find the eigenvalue decomposition for
by applying the SVD to
.
The SVD, will serve as a tool for the method of PCR considered in the next
module.
PCA may be considered a tool for discovering structures in multivariate
data, in particular for the purpose of reducing the dimensionality. PCA, in
effect, takes your cloud of data points, and rotates and projects it onto a
space of lower dimension, selecting the directions in the data space with
maximum variability, or equivalently high information.
For example, a spectral block
contains a lot of redundant
information, because absorbances for adjacent frequencies are highly
correlated, and because features stemming from a given analyte are spread
out over a range of different frequencies. We hence want to find out if
there are one, two, or a few factors (directions) along which the spectra
show high variability.
Figure 5.1 shows an example of loadings and scores
plots for a set of simulated data.
Figure 5.1:
Loadings plot (a) and scores plot (b) for simulated data.
|
The loadings plot shows the vectors
, giving the
directions with maximum variability. In higher dimensions, these plots must
be made as spectral plots, as illustrated in the Examples section.
The first loadings vector is, in effect, chosen as the line through the
centroid of the data that minimizes the square of the distance of each point
to the line. Thus, in effect, the line is as close as possible to all data,
and therefore shows the direction in the data with maximum variation. The
second loadings vector is orthogonal to the first, and subject to that
constraint satisfies the same conditions as the first loading, and so on.
The size of each eigenvalue
relative to the sum
is used as a measure of the importance of the
corresponding principal component. This is discussed in more detail in
Module 6.
The scores plots are traditionally made by plotting the first two scores
against each other, in order to show the main features of the data, such as
for example groupings or outliers. The first three scores may be plotted in
a 3-d perspective plot, but if more than three scores are to be plotted,
they must be plotted as spectra.
As noted earlier, in Section 3.2.3, scaling is usually not necessary for
spectral data, because they all have the same unit. For other kinds of
-data, scaling may be necessary if the different
-variables have very
different magnitudes, in which case
should be autoscaled.
This, in effect, corresponds to using PCA on the correlation matrix for
instead of the covariance matrix
.
- 1
- Johnson, R.A. and Wichern, D.W. (1998). Applied
Multivariate Statistical Analysis. Prentice Hall, Upper Saddle River, New
Jersey.
- 2
-
StatSoft: Principal Components and Factor
Analysis
- 3
-
OSU Ecology: Principal Components
Analysis
|
 |