Stat Quest

Principal Component Analysis reduce dimension while minimizing information loss.

By projecting the data points to the larger eigenvector, we can minimize the information loss.

This can be proven mathematically.

In short, PCA minimize the distance projected from a data point to the line OR maximize the distance from the projected point to the origin. (Pythagorean theorem)

But it is easier to maximize the sum of squared distances from the projected to the origin, i.e. , which is equals to .

If we know the result of a given set of points, a new set of points with their coordinates altered by addition, then the result of PCA is the same. If their coordinates are altered by multiplication, the result of PCA is the multiplication of the original result.

Steps

\1. Compute the mean vector
\2. Compute the difference of each data point from the mean vector (the point minus mean vector)
\3. Compute the covariance matrix where is the matrix form of the data points

\4. Compute the eigenvalues and eigenvectors of unit length of

\5. Arrange the eigenvectors in descending order of the eigenvalues
\6. Transform the data points by the eigenvector matrix

\7. Subtract the transformed data point by their mean vector (the point minus mean vector) \8. For each data point, keep the value corresponding the larger eigenvalue