# 基本思想

PCA是一种简单的线性映射，当考虑降维时，我们一般有两种思路：

• 找到d-维仿射变换子空间，在合适的投影下，新的投影点与原先的投影点就接近。也就是说，在新投影下能最大限度的保持原数据的特征。
• 找到d-位投影，尽可能多的保留数据的变动（方差）。

## 定义

### 样本均值

$\mu_n = \frac{1}{n}\sum\limits_{n = 1}^{n}x_i$

### 样本协方差

$\sum_n = \frac{1}{n-1}\sum\limits_{i=1}^{n}(x_i - \mu_i)(x_i - \mu_i)^T$

$\sum_n = \frac{1}{n-1}(X - \mu_n1)(X - \mu1)^T$

# 直观理解

\begin{aligned} D &=y y^{T} \\ &=V x(V x)^{T} \\ &=V x x^{T} V^{T} \\ &=V \sum_{n} V^{T} \end{aligned}

# PCA是最佳的仿射变换拟合

$x_i \approx \mu + \sum\limits_{j= 1}^{d} \beta_i^jv_j$

$x_i = \mu + V\beta_i$

$\min\limits_{\mu,V,\beta_i.V^TV=1} \sum\limits_{i=1}^n||x_i - (\mu + V\beta_i)||^2$

### 求$\mu$的最优值

$\sum_{i=1}^n(x_i - (\mu + V\beta_i)) = 0 \Rightarrow (\sum_{i=1}^n x_i) - n\mu - V(\sum_{i=1}^n \beta_i) = 0$

$\mu^* = \frac{1}{n}\sum_{i=1}^{n}x_i = \mu_n$

$\min\limits_{\mu,V,\beta_i.V^TV=1} \sum\limits_{i=1}^n||x_i - (\mu_n + V\beta_i)||^2$

### 求$\beta_i$的最优值

$\min\limits_{\beta_i}||x_i - \mu_n - V\beta_i||^2 = \min\limits_{\beta_i}||x_i - \mu_n - \sum\limits_{j=1}^d\beta_i^jv_j||^2$

$\beta_i^j = v_j^T(x_i - \mu_n)\Rightarrow \beta_i = V^T(x_i - \mu _n)$

$\min\limits_{V^TV = 1} \sum\limits_{i= 1 } ^n ||(x_i - \mu_n) - VV^T(x_i - \mu_n)||^2$

### 求$V$的最优值

$||x||^2 = $$V^TV = 1$，可以得到：

\begin{aligned} \left\|\left(x_{i}-\mu_{n}\right)-V V^{T}\left(x_{i}-\mu_{n}\right)\right\|^{2} &=\left[\left(x_{i}-\mu_{n}\right)-V V^{T}\left(x_{i}-\mu_{n}\right)\right]^{T}\left[\left(x_{i}-\mu_{n}\right)-V V^{T}\left(x_{i}-\mu_{n}\right)\right] \\ &=\left(x_{i}-\mu_{n}\right)^{T}\left(x_{i}-\mu_{n}\right)-\left(x_{i}-\mu_{n}\right)^{T} V V^{T}\left(x_{i}-\mu_{n}\right)-\left(x_{i}-\mu_{n}\right)^{T} V V^{T}\left(x_{i}-\mu_{n}\right)+\left(x_{i}-\mu_{n}\right)^{T} V V \\ &=2\left(x_{i}-\mu_{n}\right)^{T}\left(x_{i}-\mu_{n}\right)-2\left(x_{i}-\mu_{n}\right)^{T} V V^{T}\left(x_{i}-\mu_{n}\right) \end{aligned}

$\max _{V^{T} V=1} \sum_{i=1}^{n}\left(x_{i}-\mu_{n}\right)^{T} V V^{T}\left(x_{i}-\mu_{n}\right)$ 化简原式可得：

$\sum_{i=1}^{n}\left(x_{i}-\mu_{n}\right)^{T} V V^{T}\left(x_{i}-\mu_{n}\right)=\sum_{i=1}^{n}\left[V^{T}\left(x_{i}-\mu_{n}\right)\right]^{T}\left[V^{T}\left(x_{i}-\mu_{n}\right)\right]$ 由矩阵的迹的性质可得：

$y^Ty = Tr(yy^T)$

$\max\limits _{V^TV=1}\sum\limits_{i=1}^n (x_i - \mu_n)^TVV^T(x_i - \mu_n) =\max\limits _{V^TV=1} (n-1)Tr(V^T\sum_nV)$

$\max\limits _{V^TV=1} Tr(V^T\sum_nV)$

# PCA保留最大方差

$\text{Total Variance} (X_n) = \frac{1}{n} \sum\limits||x_i- \mu_n||^2 = \frac {1}{n} \sum\limits_{i=1}^n||x_i - \frac{1}{n}\sum\limits_{i=1}^n x_i||^2$

$\max\limits _{V^TV=1}\sum\limits_{i=1}^n ||V^Tx_i - \frac{1}{n}\sum\limits_{i=1}^n V^Tx_i||^2$

$\sum\limits_{i=1}^n ||V^Tx_i - \frac{1}{n}\sum\limits_{i=1}^n V^Tx_i||^2 =\sum\limits_{i=1}^n ||V^T(x_i - \mu_n)||^2= (n-1)Tr(V^T\sum_nV)$

$\max\limits_{V^TV = 1}Tr(V^T\sum_nV)$