View/Print PDFPS
Module 2: Matrix algebra
By Bent Jørgensen and Yuri Goegebeur


Table of Contents





2.1 Matrices, vectors and scalars

Previous Section
Next Section

Ugarte: You despise me, don't you? Rick Blaine: If I gave you any thought I probably would. [Casablanca, 1942]

The following quote serves as a suitable introduction to this module:

Linear algebra is the language of chemometrics. One cannot expect to truly understand most chemometric techniques without a basic understanding of linear algebra. This [module] reviews the basics of linear algebra and provides the reader with the foundations required for understanding most chemometrics literature. It is presented in a rather dense fashion. [...] The goal has been to condense into as few pages as possible the aspects of linear algebra used in most chemometrics methods. [Wise and Gallagher, 1998]

Note that in this module, we use a mixture of the general notation introduced in Module 1, and a local notation, where letters are used freely to denote different vectors and matrices etc.

  • A single number is usually called a scalar, and is represented by math italics, e.g. $ c$ .

  • A matrix is a two dimensional array of numbers and is represented by bold upper case math italics e.g. $ \boldsymbol{X}$ . For example

    \begin{displaymath}
\boldsymbol{X}=\left[
\begin{array}{ccc}
12 & 3 & 8 \\
-2.3 & 4 & 1\end{array}\right]
\end{displaymath}

    is a $ 2\times 3$ matrix, having 2 rows and 3 columns. The dimensions of a matrix are normally presented with the number of rows first and the number of columns second. We use square brackets to represent a matrix, in order to emphasize its rectangular shape.

  • Indices: The entries of a matrix are denoted by the same letter as the matrix, but in lower case math italics with two indices, e.g. $ \boldsymbol{X}=\{x_{ij}\}$ for $ i=1,\ldots ,2$ (rows) and $ j=1,\ldots ,3$ (columns). For example, in the matrix $ \boldsymbol{X}$ above, the (2,1) element is $ x_{21}=-2.3$ .

  • A vector consists of a row or column of numbers and is represented by bold lower case italics e.g. $ \boldsymbol{x}$ . Vectors can be considered as matrices with one dimension equal to 1. For example

    \begin{displaymath}
\boldsymbol{z}=\left[
\begin{array}{cccc}
3 & -14 & 9.4 & 0\end{array}\right]
\end{displaymath}

    is a $ 1\times 4$ row vector (sometimes the entries are separated by commas), and

    \begin{displaymath}
\boldsymbol{w}=\left[
\begin{array}{c}
5.6 \\
2.8 \\
1.9\end{array}\right]
\end{displaymath}

    a $ 3\times 1$ column vector.




2.2 Special matrices

Previous Section
Next Section

  • A square matrix is one that has the same number of rows and columns, e.g.

    \begin{displaymath}
\boldsymbol{A}=\left[
\begin{array}{ccc}
-7 & \text{\ \ }4 ...
... }4.1 \\
\text{\ \ }3 & \text{\ \ }4 & -12\end{array}\right]
\end{displaymath}

    is a $ 3\times 3$ square matrix.

  • The main diagonal of a square matrix consists of the numbers $ a_{ii}$ with equal row and column indices, and is usually represented as a vector. So

    \begin{displaymath}
\left[
\begin{array}{ccc}
-7 & -3 & -12\end{array}\right]
\end{displaymath}

    is the row vector representing the main diagonal of the matrix $ \boldsymbol{A}$ from above.

  • A square matrix is called diagonal when all the non-diagonal entries are zero, e.g.

    \begin{displaymath}
\boldsymbol{A}=\left[
\begin{array}{cc}
1 & 0 \\
0 & 4\end{array}\right] .
\end{displaymath}

    We let $ \mathrm{diag}\{a_{1},\boldsymbol{\ldots },a_{n}\}$ denote the diagonal matrix with main diagonal entries $ a_{1},\boldsymbol{\ldots },a_{n}$ . We let $ \mathrm{diag}\boldsymbol{A}$ denote the row vector containing the diagonal of $ \boldsymbol{A}$ .

  • An identity matrix (sometimes called a unit matrix) is a diagonal matrix whose main diagonal elements are equal to 1. It is denoted by $ \boldsymbol{I}$ . For example

    \begin{displaymath}
\boldsymbol{I}=\left[
\begin{array}{cc}
1 & 0 \\
0 & 1\end{array}\right]
\end{displaymath}

    is an identity matrix. A subscript may be added to emphasize the dimension of the matrix, so $ \boldsymbol{I}_{n}$ denotes the $ n\times n$ identity matrix.

  • A zero matrix or vector is any matrix or vector whose entries are all zero. For example

    \begin{displaymath}
\boldsymbol{0}=\left[
\begin{array}{cc}
0 & 0 \\
0 & 0 \\
0 & 0\end{array}\right]
\end{displaymath}

    is the $ 3\times 2$ zero matrix, which may also be denoted $ \boldsymbol{0}_{3\times 2}$ to emphasize its dimensions. In particular, we use $ \boldsymbol{0}$ to denote the origin of a vector space.

  • A 'One' matrix or vector is any matrix or vector whose entries are all 1, e.g.

    \begin{displaymath}
\boldsymbol{1}=\left[
\begin{array}{cc}
1 & 1 \\
1 & 1 \\
1 & 1\end{array}\right]
\end{displaymath}

    Again its dimensions may be added as a subscript for emphasis, e.g. $ \boldsymbol{1}_{3\times 2}$ .




2.3 Matrix and vector operations

Previous Section
Next Section

  • Addition and subtraction: Two matrices can be added or subtracted componentwise if they have identical dimensions (are conformable for addition). Hence

    \begin{displaymath}
\left[
\begin{array}{cc}
\text{\ \ \ }9 & 7 \\
-8 & 4 \\
...
...y}{cc}
\ \ 9 & 0 \\
-19 & 7 \\
\ \ 3 & 10\end{array}\right]
\end{displaymath}

    and

    \begin{displaymath}
\left[
\begin{array}{cc}
\text{\ \ \ }9 & 7 \\
-8 & 4 \\
...
...ray}{cc}
\ \ 9 & 14 \\
3 & 1 \\
\ -7 & -2\end{array}\right]
\end{displaymath}

  • Scalar multiplication: A matrix can be multiplied by a constant, which is done componentwise. For example, if

    \begin{displaymath}
\boldsymbol{X}=\left[
\begin{array}{ccc}
12 & 3 & 8 \\
-2.3 & 4 & 1\end{array}\right]
\end{displaymath}

    then the scalar multiple $ 3\boldsymbol{X}$ is

    \begin{displaymath}
3\boldsymbol{X}=\left[
\begin{array}{ccc}
36 & 9 & 24 \\
-6.9 & 12 & 3\end{array}\right]
\end{displaymath}

  • The transpose $ \boldsymbol{X}^{\top }$ of a matrix $ \boldsymbol{X}$ is the matrix formed by letting columns become rows (or rows columns), and is denoted by the superscript $ \top $ . So, for example, the $ 4\times 3$ matrix

    \begin{displaymath}
\boldsymbol{X}=\left[
\begin{array}{ccc}
3.1 & 0.2 & 6.1 \\...
...3.8 \\
20 & -1 & 2.2 \\
5.1 & 3.8 & -3.94\end{array}\right]
\end{displaymath}

    becomes a $ 3\times 4$ matrix after transposition,

    \begin{displaymath}
\boldsymbol{X}^{\top }=\left[
\begin{array}{cccc}
3.1 & -4....
... & -1 & 3.8 \\
6.1 & -13.8 & 2.2 & -3.94\end{array}\right] .
\end{displaymath}

    Some authors use the superscript $ ^{\prime }$ instead of $ \top $ for transpose.

  • A square matrix $ \boldsymbol{A}$ is called symmetric if $ \boldsymbol{A}=\boldsymbol{A}^{\top }$ , e.g.

    \begin{displaymath}
\boldsymbol{A}=\left[
\begin{array}{cc}
1 & 2 \\
2 & 4\end{array}\right] .
\end{displaymath}

  • Linear combinations: If $ \boldsymbol{a}_{1},\ldots ,\boldsymbol{a}_{k}$ are vectors of common dimension, and $ c_{1},\ldots
,c_{k} $ are constants, then the vector

    $\displaystyle c_{1}\boldsymbol{a}_{1}+\cdots +c_{k}\boldsymbol{a}_{k}
$

    is called a linear combination of $ \boldsymbol{a}_{1},\ldots ,\boldsymbol{a}_{k}$ with coefficients $ c_{1},\ldots ,c_{k}$ .

  • The linear subspace spanned by $ \boldsymbol{a}_{1},\ldots ,\boldsymbol{a}_{k}$ is the set of all linear combinations of $ \boldsymbol{a}_{1},\ldots ,\boldsymbol{a}_{k}$ with arbitrary $ c_{1},\ldots ,c_{k}$ .

  • Vectors $ \boldsymbol{a}_{1},\ldots ,\boldsymbol{a}_{k}$ are said to be linearly independent if

    $\displaystyle c_{1}\boldsymbol{a}_{1}+\cdots +c_{k}\boldsymbol{a}_{k}=\boldsymbol{0}
$

    implies $ c_{1}=\cdots =c_{k}=0$ .

  • The rank of a matrix is the maximum number of linearly independent columns of the matrix. For example

    \begin{displaymath}
\mathrm{rank}\left[
\begin{array}{cc}
1 & 2 \\
3 & 4\end{array}\right] =2,
\end{displaymath}

    but

    \begin{displaymath}
\mathrm{rank}\left[
\begin{array}{cc}
1 & 2 \\
3 & 6\end{array}\right] =1.
\end{displaymath}

    because the vectors \begin{displaymath}\left[
\begin{array}{c}
1 \\
3\end{array}\right] \end{displaymath} and \begin{displaymath}\left[
\begin{array}{c}
2 \\
6\end{array}\right] \end{displaymath} are linearly dependent. The rank is also equal to the maximum number of linearly independent rows of the matrix.

  • Geometric interpretation: consider for example the three columns $ \boldsymbol{x}_{1}$ , $ \boldsymbol{x}_{2}$ , $ \boldsymbol{x}_{3}$ of the above matrix $ \boldsymbol{X}$ . They are 4-vectors. If they all point in the same direction (up to a sign), then the rank of $ \boldsymbol{X}$ is 1. If this is not the case, but they all lie within a plane through $ \boldsymbol{0}$ , then $ \boldsymbol{X}$ has rank 2. If this is not the case, then the rank is 3.

  • The maximum possible rank for a matrix is the smallest of its column and row dimensions. In case of maximum rank the matrix is said to have full rank, or be non-singular.

  • A singular matrix is also called rank-deficient, or is said to have a problem of multicollinearity.




2.4 Matrix multiplication

Previous Section
Next Section

  • If matrix $ \boldsymbol{A}=\{a_{ij}\}$ has dimensions $ k\times m$ and matrix $ \boldsymbol{B}=\{b_{ij}\}$ has dimensions $ m\times n$ (having the second dimension of $ \boldsymbol{A}$ and the first dimension of $ \boldsymbol{B}$ in common), they are said to be comformable for multiplication. Then the matrix product $ \boldsymbol{C}=\{c_{i\ell }\}$ is a matrix of dimensions $ k\times n$ with elements defined by

    $\displaystyle c_{i\ell }=\sum\limits_{j=1}^{m}a_{ij}b_{j\ell }
$

    and we write $ \boldsymbol{C}=\boldsymbol{AB}$ .

  • Example

    \begin{displaymath}
\left[
\begin{array}{cc}
1 & 7 \\
9 & 3 \\
2 & 5\end{arra...
...\
54 & 93 & 123 & 42 \\
12 & 25 & 62 & 31\end{array}\right]
\end{displaymath}

  • Example: Consider the Beer-Lambert law for $ m$ constituents and $ k$ wavelengths plus noise:

    $\displaystyle x_{j}=\sum_{\ell =1}^{m}y_{\ell }a_{\ell j}+e_{j},
$

    for wavelengths numbered $ j=1,\ldots ,k$ . This equation may be written in matrix notation as follows:

    $\displaystyle \boldsymbol{x}=\boldsymbol{yA}+\boldsymbol{e},
$

    where

    • $ \boldsymbol{x}=[x_{1}\ldots x_{k}]$ is $ 1\times k$ ,

    • $ \boldsymbol{y}=[y_{1}\ldots y_{m}]$ is $ 1\times m$ ,

    • $ \boldsymbol{A}=\{a_{\ell j}\}$ is $ m\times k$ ,

    • $ \boldsymbol{e}=[e_{1}\ldots e_{k}]$ is $ 1\times k$ .

  • In particular, the matrix product of two square matrices of common dimension is a square matrix of that dimension.

  • Multiplication of matrices is not commutative, that is, generally $ \boldsymbol{AB}\neq \boldsymbol{BA}$ , even if the second product is allowable.

  • Associative rule for matrix product:.

    $\displaystyle (\boldsymbol{AB})\boldsymbol{C}=\boldsymbol{A}(\boldsymbol{BC})=\boldsymbol{ABC}
$

    where the last form may be used due to the equality of the two first.

  • Distributive rule of matrix multiplication and addition:

    $\displaystyle \boldsymbol{A}(\boldsymbol{B+C})=\boldsymbol{AB}+\boldsymbol{AC}
$

  • Let $ \boldsymbol{a}$ and $ \boldsymbol{b}$ be column vectors of common dimension. The inner product (dot product) of $ \boldsymbol{a}$ and $ \boldsymbol{b}$ is defined by

    $\displaystyle \boldsymbol{a}\cdot \boldsymbol{b}=\boldsymbol{a}^{\top }\boldsymbol{b=b}^{\top }\boldsymbol{a}
$

    The result is a scalar. Note that each element in the matrix $ \boldsymbol{AB}
$ is the result of an inner product between a row of $ \boldsymbol{A}$ and a column of $ \boldsymbol{B}$ . For example, let \begin{displaymath}\boldsymbol{a}=\left[
\begin{array}{c}
9 \\
3\end{array}\right] \end{displaymath} be the second row of the first matrix above (represented as a column) and \begin{displaymath}\boldsymbol{b}=\left[
\begin{array}{c}
11 \\
8\end{array}\right] \end{displaymath} be the third column of the second matrix above. Then the (2,3) entry of the product is
    $\displaystyle \boldsymbol{a}\cdot \boldsymbol{b}$ $\displaystyle =$ $\displaystyle 9\cdot 11+3\cdot 8$  
      $\displaystyle =$ $\displaystyle 123$  

  • An $ n\times 1$ one vector $ \boldsymbol{1}$ is also known as a summing vector, because $ \boldsymbol{1}^{\top }\boldsymbol{x}$ is the sum of the elements of $ \boldsymbol{x}$ .

  • Let $ \boldsymbol{a}$ and $ \boldsymbol{b}$ be column vectors of the dimensions $ m$ and $ n$ , respectively. The outer product of $ \boldsymbol{a}$ and $ \boldsymbol{b}$ is the $ m\times n$ matrix defined by

    $\displaystyle \boldsymbol{ab}^{\top }=\{a_{i}b_{j}\}
$

    It may be thought of as the matrix product of a column vector and a row vector, which are always conformable for multiplication, because they have the dimension 1 in common.

  • Inverse matrix: A non-singular square matrix $ \boldsymbol{A}$ has a unique inverse matrix $ \boldsymbol{A}^{-1}$ (square with the same dimension as $ \boldsymbol{A}$ ), which satisfies

    $\displaystyle \boldsymbol{AA}^{-1}=\boldsymbol{A}^{-1}\boldsymbol{A}=\boldsymbol{I}.
$

    $ \boldsymbol{A}$ is then said to be invertible. See Examples for various illustrations of how the inverse of a matrix is calculated.

  • Inverse of a product: For $ \boldsymbol{A}$ and $ \boldsymbol{B}$ invertible square matrices of the same size:

    $\displaystyle \left( \boldsymbol{AB}\right) ^{-1}=\boldsymbol{B}^{-1}\boldsymbol{A}^{-1}
$

  • Transpose of product: For $ \boldsymbol{A}$ and $ \boldsymbol{B}$ conformable for multiplication

    $\displaystyle \left( \boldsymbol{AB}\right) ^{\top }=\boldsymbol{B}^{\top }\boldsymbol{A}^{\top }
$

  • Matrix inverse and transpose commute:

    $\displaystyle \left( \boldsymbol{A}^{\top }\right) ^{-1}=\left( \boldsymbol{A}^{-1}\right)
^{\top }=\boldsymbol{A}^{-\top }
$

    where the last notation may be used for convenience to represent either of the two forms.

  • Linear equations: For an invertible square matrix $ \boldsymbol{A}$ and given vector $ \boldsymbol{b}$ , the equation

    $\displaystyle \boldsymbol{Ax=b}
$

    with $ \boldsymbol{x}$ unknown, has unique solution

    $\displaystyle \boldsymbol{x=A}^{-1}\boldsymbol{b}.
$

    For example, for a $ 3\times 3$ matrix, this equation is equivalent to the following system of linear equations in $ x_{1}$ , $ x_{2}$ , $ x_{3}$ :
    $\displaystyle a_{11}x_{1}+a_{12}x_{2}+a_{13}x_{3}$ $\displaystyle =$ $\displaystyle b_{1}$  
    $\displaystyle a_{21}x_{1}+a_{22}x_{2}+a_{23}x_{3}$ $\displaystyle =$ $\displaystyle b_{2}$  
    $\displaystyle a_{31}x_{1}+a_{32}x_{2}+a_{33}x_{3}$ $\displaystyle =$ $\displaystyle b_{3}$  

  • If $ \boldsymbol{A}$ is square, but not invertible, then any matrix $ \boldsymbol{A}^{-}$ that satisfies

    $\displaystyle \boldsymbol{AA}^{-}\boldsymbol{A}=\boldsymbol{A}
$

    is called a generalized inverse of $ \boldsymbol{A}$ . There are in general many such matrices $ \boldsymbol{A}^{-}$ . The Moore-Penrose inverse $ \boldsymbol{A}^{-}$ (unique) is further assumed to satisfy
    $\displaystyle \boldsymbol{A}^{-}\boldsymbol{AA}^{-}$ $\displaystyle =$ $\displaystyle \boldsymbol{A}^{-}$  
    $\displaystyle \left( \boldsymbol{AA}^{-}\right) ^{\top }$ $\displaystyle =$ $\displaystyle \boldsymbol{AA}^{-}$  
    $\displaystyle \left( \boldsymbol{A}^{-}\boldsymbol{A}\right) ^{\top }$ $\displaystyle =$ $\displaystyle \boldsymbol{A}^{-}\boldsymbol{A}$  

  • Let $ \boldsymbol{A}$ be an $ n\times k$ matrix of rank $ k\leq n$ . Then the $ k\times n$ matrix

    $\displaystyle \boldsymbol{A}^{+}=\left( \boldsymbol{A}^{\top }\boldsymbol{A}\right) ^{-1}\boldsymbol{A}^{\top }
$

    is called the left pseudo-inverse of $ \boldsymbol{A}$ . Note that the rank condition on $ \boldsymbol{A}$ guarantees that the $ k\times k$ matrix $ \boldsymbol{A}^{\top }\boldsymbol{A}$ is invertible. $ \boldsymbol{A}^{+}$ satisfies the condition

    $\displaystyle \boldsymbol{A}^{+}\boldsymbol{A}=\boldsymbol{I}_{k}
$

  • Similarly, let $ \boldsymbol{A}$ be an $ n\times k$ matrix of rank $ n\leq k$ . Then the $ k\times n$ matrix

    $\displaystyle \boldsymbol{A}^{\vdash }=\boldsymbol{A}^{\top }\left( \boldsymbol{AA}^{\top
}\right) ^{-1}
$

    is called the right pseudo-inverse of $ \boldsymbol{A}$ . Then $ \boldsymbol{A}^{\vdash }$ satisfies the condition

    $\displaystyle \boldsymbol{AA}^{\vdash }=\boldsymbol{I}_{n}
$




2.5 Norm and trace

Previous Section
Next Section

  • Vector norm (length): Consider the $ n$ -vector

    \begin{displaymath}
\boldsymbol{x}=\left[
\begin{array}{c}
x_{1} \\
\vdots \\
x_{n}\end{array}\right] .
\end{displaymath}

    The norm of $ \boldsymbol{x}$ is defined by
    $\displaystyle \left\Vert \boldsymbol{x}\right\Vert$ $\displaystyle =$ $\displaystyle \sqrt{\boldsymbol{x}^{\top }\boldsymbol{x}}$  
      $\displaystyle =$ $\displaystyle \sqrt{x_{1}^{2}+\cdots +x_{n}^{2}}.$  

    The norm of $ \boldsymbol{x}$ is the length of the line going from $ \boldsymbol{0}$ to the point in $ n$ -space indicated by $ \boldsymbol{x}$ .

  • A vector of norm 1 is called a unit vector. The unit vector in the direction of $ \boldsymbol{x}$ is defined by standardization

    $\displaystyle \boldsymbol{e}=\frac{\boldsymbol{x}}{\left\Vert \boldsymbol{x}\right\Vert }.
$

  • The trace $ \mathrm{tr}$ of a square matrix is the sum of its diagonal elements, e.g.

    \begin{displaymath}
\mathrm{tr}\left[
\begin{array}{cc}
1 & 2 \\
3 & 4\end{array}\right] =1+4=5.
\end{displaymath}

  • Trace operator satisfies a kind of commutative property:

    $\displaystyle \mathrm{tr}(\boldsymbol{AB})=\mathrm{tr}(\boldsymbol{BA})
$

    for matrices $ \boldsymbol{A}$ and $ \boldsymbol{B}$ such that $ \boldsymbol{AB}
$ and $ \boldsymbol{BA}$ both exist (i.e. $ \boldsymbol{A}$ and $ \boldsymbol{B}^{\top }$ have common dimension).




2.6 Orthogonal vectors, matrices and projection

Previous Section
Next Section

  • The two $ n$ -vectors $ \boldsymbol{x}$ and $ \boldsymbol{y}$ are said to be orthogonal if their inner product is zero,

    $\displaystyle \boldsymbol{x}^{\top }\boldsymbol{y}=0.
$

    For example, the matrix

    \begin{displaymath}
\left[
\begin{array}{cc}
1 & -1 \\
0 & 5 \\
1 & 1\end{array}\right]
\end{displaymath}

    has orthogonal columns.

  • A set of vectors is said to be an orthogonal set if the vectors are pairwise orthogonal. An orthogonal set consisting of unit vectors, is called an orthonormal set.

  • If

    $\displaystyle \boldsymbol{X}^{\top }\boldsymbol{Y}=\boldsymbol{0}
$

    the columns of $ \boldsymbol{X}$ are all orthogonal to those of $ \boldsymbol{Y}$ .

  • A matrix $ \boldsymbol{A}$ is called orthogonal if the columns of $ \boldsymbol{A}$ are orthonormal, that is,

    $\displaystyle \boldsymbol{A}^{\top }\boldsymbol{A}=\boldsymbol{I}
$

    It would have been more logical to call $ \boldsymbol{A}$ an orthonormal matrix, but the above terminology is standard in linear algebra. For example, the matrix

    \begin{displaymath}
\left[
\begin{array}{cc}
0.7071 & -0.1925 \\
0 & 0.9623 \\
0.7071 & 0.1925\end{array}\right]
\end{displaymath}

    is orthogonal.

  • If a square matrix $ \boldsymbol{A}$ is orthogonal, then so is $ \boldsymbol{A}^{\top }$ , and $ \boldsymbol{A}^{-1}=\boldsymbol{A}^{\top }$ .

  • The orthogonal projection of the $ n$ -vector $ \boldsymbol{y}$ onto the linear subspace spanned by the columns of the $ n\times k$ matrix $ \boldsymbol{X}$ is $ \boldsymbol{Hy}$ , where

    $\displaystyle \boldsymbol{H}=\boldsymbol{X}\left( \boldsymbol{X}^{\top }\boldsymbol{X}\right) ^{-1}\boldsymbol{X}^{\top }
$

    which requires that $ \mathrm{rank\,}\boldsymbol{X}=k$ .

  • Note the following useful properties of rank:
    $\displaystyle \mathrm{rank\,}\boldsymbol{X}$ $\displaystyle =$ $\displaystyle \mathrm{rank\,}\left( \boldsymbol{X}^{\top }\boldsymbol{X}\right)$  
      $\displaystyle =$ $\displaystyle \mathrm{rank\,}\left( \boldsymbol{XX}^{\top }\right)$   .  

  • Matrix $ \boldsymbol{H}$ is a projection matrix, also known in statistics as a hat matrix. It is symmetric and idempotent, that is, it satisfies

    $\displaystyle \boldsymbol{H}^{2}=\boldsymbol{HH}=\boldsymbol{H}$.$\displaystyle $

    Its trace is

    $\displaystyle \mathrm{tr}(\boldsymbol{H})=k$.$\displaystyle $

  • In the simplest case, the projection of $ \boldsymbol{y}$ on $ \boldsymbol{x}$ is

    $\displaystyle \left( \boldsymbol{e}^{\top }\boldsymbol{y}\right) \boldsymbol{e}$,$\displaystyle $

    where $ \boldsymbol{e}=\frac{\boldsymbol{x}}{\left\Vert \boldsymbol{x}\right\Vert }$ is the unit vector in the direction of $ \boldsymbol{x}$ .

  • Orthonormalization of three vectors $ \boldsymbol{x}_{1}$ , $ \boldsymbol{x}_{2}$ and $ \boldsymbol{x}_{3}$ : First form $ \boldsymbol{e}_{1}$ , the unit vector in the direction of $ \boldsymbol{x}_{1}$ . Then form

    $\displaystyle \boldsymbol{x}_{2}^{\prime }=\boldsymbol{x}_{2}-\left( \boldsymbol{x}_{2}^{\top }\boldsymbol{e}_{1}\right) \boldsymbol{e}_{1}$,$\displaystyle $

    and standardize $ \boldsymbol{x}_{2}^{\prime }$ to $ \boldsymbol{e}_{2}$ . The third step is to form

    $\displaystyle \boldsymbol{x}_{3}^{\prime }=\boldsymbol{x}_{3}-\left( \boldsymbo...
...}-\left( \boldsymbol{x}_{3}^{\top }\boldsymbol{e}_{2}\right) \boldsymbol{e}_{2}$,$\displaystyle $

    and standardize $ \boldsymbol{x}_{3}^{\prime }$ to $ \boldsymbol{e}_{3}$ . The set $ \boldsymbol{e}_{1}$ , $ \boldsymbol{e}_{2}$ and $ \boldsymbol{e}_{3}$ is then orthonormal, and spans the same linear subspace as $ \boldsymbol{x}_{1}$ , $ \boldsymbol{x}_{2}$ and $ \boldsymbol{x}_{3}$ . The extension to more than three vectors is straightforward.




2.7 Eigenvalues, determinant and condition number

Previous Section
Next Section

  • For any $ n\times n$ square matrix $ \boldsymbol{A}$ , a number $ \lambda $ such that there exists a non-zero vector $ \boldsymbol{p}$ such that

    $\displaystyle \boldsymbol{Ap}=\lambda \boldsymbol{p}
$

    is called an eigenvalue for $ \boldsymbol{A}$ , and $ \boldsymbol{p}$ is called the corresponding eigenvector.

  • Any non-zero scalar multiple of $ \boldsymbol{p}$ is then also an eigenvector for the same eigenvalue; so we normally assume that an eigenvector is standardized to be a unit vector.

  • Assume now that $ \boldsymbol{A}$ is symmetric, then we may take its eigenvectors $ \boldsymbol{p}_{1}\boldsymbol{,\ldots ,p}_{n}$ to be orthonormal. Let $ \lambda _{1}\boldsymbol{,\ldots ,}\lambda _{n}$ denote the corresponding eigenvalues, counting multiplicities. If we let $ \boldsymbol{p}_{1}\boldsymbol{,\ldots ,p}_{n}$ form the columns of an orthogonal matrix $ \boldsymbol{P}$ , then the set of equations

    $\displaystyle \boldsymbol{Ap}_{i}=\lambda _{i}\boldsymbol{p}_{i},$ $\displaystyle i=1,\boldsymbol{\ldots },n
$

    may be written in matrix form as follows:

    $\displaystyle \boldsymbol{AP}=\boldsymbol{P\Lambda ,}
$

    where $ \boldsymbol{\Lambda }=\mathrm{diag}\{\lambda _{1},\boldsymbol{\ldots },\lambda _{n}\}.$

  • Since $ \boldsymbol{P}$ is orthogonal, we have $ \boldsymbol{P}^{\top }=\boldsymbol{P}^{-1}$ , so by right-multiplying the equation by $ \boldsymbol{P}^{\top }$ we obtain

    $\displaystyle \boldsymbol{A}=\boldsymbol{P\Lambda P}^{\top }\boldsymbol{.}
$

    This is known as the eigenvalue decomposition of $ \boldsymbol{A}$ , or the spectral decomposition. A more explicit way of writing the eigenvalue decomposition is as follows:

    $\displaystyle \boldsymbol{A}=\sum_{i=1}^{n}\lambda _{i}\boldsymbol{p}_{i}\boldsymbol{p}_{i}^{\top },
$

    involving the outer products $ \boldsymbol{p}_{i}\boldsymbol{p}_{i}^{\top }$ .

  • When the eigenvalues $ \lambda _{1}\boldsymbol{,\ldots ,}\lambda _{n}$ are all positive, $ \boldsymbol{A}$ is called positive-definite, and if they are all non-negative, then $ \boldsymbol{A}$ is called positive semi-definite. If there are both positive and negative eigenvalues, then $ \boldsymbol{A}$ is called indefinite.

  • The determinant of $ \boldsymbol{A}$ may be defined by

    $\displaystyle \det \boldsymbol{A}=\prod_{i=1}^{n} \lambda _{i}
$

  • The matrix $ \boldsymbol{A}$ is non-singular if and only if $ \det
\boldsymbol{A}\neq 0$ .

  • The trace of $ \boldsymbol{A}$ is the sum of its eigenvalues:

    $\displaystyle \mathrm{tr}\boldsymbol{A}=\sum_{i=1}^{n}\lambda _{i}
$

  • The matrix condition number of a square matrix $ \boldsymbol{A}
$ is the ratio $ \lambda _{\max }/\lambda _{\min }$ of the largest to smallest eigenvalue of $ \boldsymbol{A}$ . A large condition number (bigger than 100, say) means that the solution to the equation

    $\displaystyle \boldsymbol{Ax=b}
$

    is very sensitive to small changes in $ \boldsymbol{b}$ , and similarly for calculating the inverse of $ \boldsymbol{A}$ , and may tend to give numerical problems when calculating a solution. In such cases, the problem, or the matrix $ \boldsymbol{A}$ , are said to be ill-conditioned.

  • If $ \boldsymbol{X}$ is an $ n\times k$ matrix of rank $ k\leq n$ , then a large matrix condition number for the symmetric matrix $ \boldsymbol{X}^{\top
}\boldsymbol{X}$ is said to indicate an ill-conditioned $ \boldsymbol{X}$ .

Bibliography

6
Jørgensen, J. Printz, P. and Winge, P. (1977). Lineær algebra. København: Gad. [Elementary introduction to linear algebra, in Danish.]

6
Leon, S.J. (1998). Linear Algebra With Applications. Prentice Hall, Upper Saddle River. [Linear algebra textbook, directed at math students.]

3
Matrix Algebra. S.O.S. Mathematics.

4
Searle, S.R. (1982). Matrix Algebra Useful for Statistics. Wiley, New York.

5
Searle, S.R. (1998). Matrix Algebra. Encyclopedia of Biostatistics, Vol. 3, 2456-2474.

6
Wise, B.M. and Gallagher, N.B. (1998). An Introduction to Linear Algebra for use with Matlab.

HOME | Back

Last modified January 29, 2007. Webmaster
©2001-2005 Master Of Applied Statistics