Singular Value Decomposition (SVD) - AAU-Dat/P5-Nonlinear-Dimensionality-Reduction GitHub Wiki

Introduction


SVD, is a well used and general purpose useful tools for linear algebra, for data processing.

  • Is used to reduce data to its key features that are useful for analysing, understanding and describing the data.
  • Data-Driven generalization of the Fourier transform.
    • Based on sine and cosine expansions to approximate functions.
    • Basically using mathematical transformations of data, to map it into some new coordinate system, so that it becomes simple to understand
  • The SVD is "Tailored" to a specific problem.
    • Where there exists many different mathematical transformations for specific types of data, if there is none for the data needed, then the SVD can be used to tailor a coordinate system or a transformation based on the data that is available.
  • Widely used e.g. :
    • Google
    • Facebook
    • Amazon
    • Netflix
  • Based on very simple linear algebra
    • Meaning that any time there is a data matrix, the SVD can be computed, giving it interpretable and understandable features, that can be used.
    • Scalable

Algorithm


The basis for the SVD, is that a Matrix $X$ can be written into three simple matrices, as seen below:

$$ X= \left(\begin{array}{cc} | & | & |\ x_1 & x_2 & x_m \ | & | & | \end{array}\right) = U\Sigma V^T = \left(\begin{array}{cc} | & | & |\ u_1 & u_2 & u_n \ | & | & | \end{array}\right) \left(\begin{array}{cc} \sigma_1 & & \ & \sigma_2 & \ & & \sigma_m \ | & | & | \ & 0 & \end{array}\right) \left(\begin{array}{cc} | & v_1^T & | \ | & v_m^T & | \ & ... & \end{array}\right) = \hat{U}\hat{\Sigma}V^T $$

Where:

$$ U^TU=I_{n*n} $$

$$ V^TV=I_{m*m} $$

(Meaning that $U$ and $V$ are orthogonal)

  • Where $X$ is the data matrix, that can be written as the product of three matrices, $U\Sigma V^T$, $U$ is information on the column space of $X$, $V$ is information in the row space of $X$, and $\Sigma$ is a diagonal matrix that tells how important the different columns of $U$ and $V$ are.

    • Thinking in terms of linear transformations, any linear transformation $T$ has the same effect as a sequence of three simple transformations, therefore $U\Sigma V^T$ can be thought of as $T$ = rotation₂ ∘ stretch ∘ rotation₁
    • The columns are hierarchically arranged so that $u_1$ is more important than $u_2$ and the same accounts for $v_1^T$ and $v_2^T$ etc. This importance is in the singular values $\sigma$
    • We assume that $n>>m$, meaning that we have more measurements in each column, that we have entries of columns
      • This is called the "Economy SVD"
  • To get the matrices $U \Sigma V^T$, is by decomposing the matrix $X$.

    • Calculating the SVD consists of finding the eigenvalues and eigenvectors of $XX^T$ and $X^TX$. The eigenvectors of $X^TX$ make up the columns of $V$ , the eigenvectors of $XX^T$  make up the columns of $U$. Also, the singular values in $\Sigma$ are square roots of eigenvalues from $XXT$ or $X^TX$.  The singular values are the diagonal entries of the $\Sigma$ matrix and are arranged in descending order. The singular values are always real numbers. If the matrix $X$ is a real matrix, then $U$ and $V$ are also real.

Notice how there are $m$ columns in the $X$ matrix, and $n$ columns in the $u$ matrix, this means that only the columns up to $m$ are relevant, because of something to do with the $\sigma$ matrix.

  • $=\sigma_1 u_1 v_1^T +\sigma_2 u_2 v_2^T + ... + \sigma_m u_m v_m^T + 0$
  • The way this is done is by doing something called "The outer product", this is done by first multiplying $\sigma_1$ with the first element in $u_1$, then multiplying that with each element in $v_1^T$, then for the second row, $\sigma_1$ multiplied by the second element in $u_1$, and the multiplying that with each element in $v_1^T$.
    • So every element in column $u_1$ is multiplied by $\sigma_1$ and then multiplied by the entire row $v_1^T$
  • This can be "truncated" at a value R to approximate the estimation, this is used when there are a lot of entries and the further we get, the less impact they have so that become obsolete. So we simply stop at rank R
    • $\approx\tilde{U}\tilde{\Sigma}\tilde{V^T}$ Eckard - Young theorem [1936]

Examples


SVD is can be used on a lot of different types of data, as it's main purpose is simply to clarify the most important aspects of the data. This is why SVD is often called the beginning step, as it is used to find the relevant data to be worked on.

Sources


Article

YouTube

YouTube 2

YouTube 3

MIT

Computation of the SVD