• PCA uses the distances to the sample means to give classification.

  • But instead of using the Euclidean distances in the feature space directly, it uses distances weighted by the covariance matrices.

  • The first principal component of a sample vector lies parallel to the direction along which there is the largest variance over all samples