The Fisher Sample Linear Discriminant Equations

If we assume that the input data matrix $ \mathbf{X} \in \mathbb{R}^{n
\times p}$, containing the samples from $ k$ different populations $ \pi_i \;\; i=1,\hdots, k$. A vital assumption made when applying the LDA method is that the $ p
\times p$ covariance matrices for each of the $ k$ populations are equal, and of full rank, i.e

$\displaystyle \boldsymbol{\Sigma}_1=\hdots=\boldsymbol{\Sigma}_k=\boldsymbol{\Sigma}.$    

If these matrices are not of full rank, they can be replaced by

$\displaystyle \mathbf{P}^T\boldsymbol{\Sigma} \mathbf{P}$    

where $ \mathbf{P}=[\mathbf{e_1,\hdots,e_q}]$ are the $ q$ eigenvectors corresponding to the $ q$ nonzero eigenvalues of the covariance matrix $ \boldsymbol{\Sigma}$ [1]. The expected value for population $ i$ is given as $ \boldsymbol{\mu}_i$.

If we consider the linear combination given by

$\displaystyle Y_i=\mathbf{a}_i^T \mathbf{X},\;\; i=1,\hdots,k.$ (1)

Form the assumptions above, we know that the expected value of $ Y_i$ is $ \mathbf{a}^T \boldsymbol{\mu}_i$ and variance $ \mathbf{a}^T \boldsymbol{\Sigma} \mathbf{a}$ for all populations. The goal of this method is to separate the populations as much as possible, we therefore try to maximize the ratio

$\displaystyle \frac{ \left( \begin{array}{l}
 \mbox{Sum of squared distances fr...
...{population to overall mean of Y}
 \end{array} \right)}{(\mbox{Variance of }Y)}$ $\displaystyle =
 \frac{\sum_{i=1}^k(\mathbf{a}^T \boldsymbol{\mu}_i-\mathbf{a}^T
 \bar{\boldsymbol{\mu}})^2}{\mathbf{a}^T \boldsymbol{\Sigma}
 \mathbf{a}}$    
  $\displaystyle = \frac{\mathbf{a}^T \left( \sum_{i=1}^k
 (\boldsymbol{\mu}_i-\ba...
...ymbol{\mu}})^T
 \right) \mathbf{a}}{\mathbf{a}^T \boldsymbol{\Sigma}\mathbf{a}}$    
  $\displaystyle = \frac{\mathbf{a}^T \mathbf{B_{\boldsymbol{\mu}}a}}{\mathbf{a}^T \boldsymbol{\Sigma}\mathbf{a}},$    

where the matrices $ \mathbf{B}$ and $ \boldsymbol{\Sigma}$ have to be estimated from the training data set.

Letting $ \mathbf{X}_i \in \mathbb{R}^{n_i \times p}$ donate the sample data set from population, we define the sample mean vector as

$\displaystyle \bar{\mathbf{x}}_i=\frac{1}{n_i} \sum_{j=1}^{n_i} \mathbf{x}_{ij},$    

and sample covariance matrix

$\displaystyle \mathbf{S}_i=\frac{1}{n_i-1} \sum_{j=1}^{n_i}
 (\mathbf{x}_{ij}-\bar{\mathbf{x}}_i)(\mathbf{x}_{ij}-\bar{\mathbf{x}}_i)^T,
 \;\;i=1,\hdots,k.$    

We also define the overall mean as

$\displaystyle \bar{\mathbf{x}}=\frac{\sum_{i=1}^k n_i \bar{\mathbf{x}}_i}{n}$    

Next the sample covariance matrix between groups is given by

$\displaystyle B=\frac{\sum_{i=1}^k n_i (\bar{\mathbf{x}}_i-\bar{\mathbf{x}})(\bar{\mathbf{x}}_i-\bar{\mathbf{x}})^T}{k-1}.$ (2)

And the sample covariance matrix within groups as

$\displaystyle \frac{\sum_{i=1}^k (n_i-1) \mathbf{S}_i}{n-g}=\mathbf{S}_p$ (3)

The rank of $ \mathbf{B}$ is at most $ \min (p,k-1)$ [2].

Now let $ \hat{\lambda}_1, \hdots, \hat{\lambda}_s$ denote the $ s=\min(
k-1,p)$ nonzero eigenvalues of $ \mathbf{W^{-1}B}$ with corresponding eigenvectors $ \hat{\mathbf{e}}_1,\hdots, \hat{\mathbf{e}}_s$, scaled s.t $ \hat{\mathbf{e}}_i^T \mathbf{W}
\hat{\mathbf{e}}_i=1\;\;i=1,\hdots s$. Then the vector of coefficients $ \hat{\mathbf{a}}$ that maximizes the ratio

$\displaystyle \frac{\hat{\mathbf{a}}^T\mathbf{B}\hat{\mathbf{a}}}{\hat{\mathbf{a}}^T\mathbf{W} \hat{\mathbf{a}}}$ (4)

is given by $ \hat{\mathbf{a}}_1=\hat{\mathbf{e}}_1$. Then

$\displaystyle \hat{y}_1=\hat{\mathbf{e}}_1^T \mathbf{x}$    

is called the sample first discriminant. Thus the sample kth discriminant is given by

$\displaystyle \hat{y}_i=\hat{\mathbf{e}}_i^T \mathbf{x},\;\;i \leq s$    

A classification rule based on the first $ r \leq s$ sample discriminants is as follows, [1]: Allocate $ \mathbf{x}_0$ to population $ \pi_i$ if

$\displaystyle \sum_{j=1}^r (\hat{y}_j-\bar{y}_{ij})^2=\sum_{j=1}^r
 [\hat{\math...
...at{\mathbf{e}}_j^T(\mathbf{x}_0-\bar{\mathbf{x}}_l)], \;\;
 \forall \;l \neq i.$    

where $ \bar{y}_{ij}=\hat{\mathbf{e}}_j^T \bar{\mathbf{x}}_i$.

Bjørn Kåre Alsberg 2006-04-06