数据挖掘与分析课程笔记（Chapter 7）

yyywxk

已于 2022-11-03 15:59:20 修改

阅读量348

点赞数

分类专栏：数学文章标签：数据挖掘概率论

于 2022-11-03 15:42:24 首次发布

本文链接：https://blog.csdn.net/yyywxk/article/details/127671781

版权

数学专栏收录该内容

19 篇文章 1 订阅

订阅专栏

数据挖掘与分析课程笔记

参考教材：Data Mining and Analysis : MOHAMMED J.ZAKI, WAGNER MEIRA JR.

文章目录

笔记目录

数据挖掘与分析课程笔记
文章目录
Chapter 7：降维

Chapter 7：降维

PCA：主元分析

7.1 背景

$\mathbf{D}=\left(\begin{array}{c|cccc} & X_{1} & X_{2} & \cdots & X_{d} \\ \hline \mathbf{x}_{1} & x_{11} & x_{12} & \cdots & x_{1 d} \\ \mathbf{x}_{2} & x_{21} & x_{22} & \cdots & x_{2 d} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathbf{x}_{n} & x_{n 1} & x_{n 2} & \cdots & x_{n d} \end{array}\right)$

对象： $\mathbf{x}_{1}^T,\cdots,\mathbf{x}_n^T \in \mathbb{R}^d$ ， $\forall \mathbf{x} \in \mathbb{R}^d,$ 设 $\mathbf{x}=(x_1,\cdots,x_d)^T= \sum\limits_{i=1}^{d}x_i \mathbf{e}_i$

其中， $\mathbf{e}_i=(0,\cdots,1,\cdots,0)^T\in\mathbb{R}^d$ ，i-坐标

设另有单位正交基 $\{\mathbf{u}\}_{i=1}^n$ ， $\mathbf{x}=\sum\limits_{i=1}^{d}a_i \mathbf{u}_i,a_i \in \mathbb{R}$ ， $\mathbf{u}_i^T \mathbf{u}_j =\left\{\begin{matrix} 1,i=j\\ 0,i\ne j \end{matrix}\right.$

$\forall r:1\le r\le d, \mathbf{x}=\underbrace{a_1 \mathbf{u}_1+\cdots+a_r \mathbf{u}_r}_{\text{投影}}+ \underbrace{a_{r+1} \mathbf{u}_{r+1}+\cdots+a_d \mathbf{u}_d}_{\text{误差}}$

前 $r$ 项是投影，后面是投影误差。

目标：对于给定 $D$ ，寻找最优 $\{\mathbf{u}\}_{i=1}^n$ ，使得 $D$ 在其前 $r$ 维子空间的投影是对 $D$ 的“最佳近似”，即投影之后“误差最小”。

7.2 主元分析：

7.2.1 最佳直线近似

（一阶主元分析）（r=1）

目标：寻找 $\mathbf{u}_1$ ，不妨记为 $\mathbf{u}=(u_1,\cdots,u_d)^T$ 。

假设： $||\mathbf{u}||=\mathbf{u}^T\mathbf{u}=1$ ， $\hat{\boldsymbol{\mu}}=\frac{1}{n} \sum\limits_{i=1}^n\mathbf{x}_i=\mathbf{0},\in \mathbb{R}^{d}$

$\forall \mathbf{x}_i(i=1,\cdots,n)$ ， $\mathbf{x}_i$ 沿 $\mathbf{u}$ 方向投影是：
$\mathbf{x}_{i}^{\prime}=\left(\frac{\mathbf{u}^{T} \mathbf{x}_{i}}{\mathbf{u}^{T} \mathbf{u}}\right) \mathbf{u}=\left(\mathbf{u}^{T} \mathbf{x}_{i}\right) \mathbf{u}=a_{i} \mathbf{u},a_{i}=\mathbf{u}^{T} \mathbf{x}_{i}$
$\hat{\boldsymbol{\mu}}=\mathbf{0}\Rightarrow$ $\hat{\boldsymbol{\mu}}$ 在 $\mathbf{u}$ 上投影是0； $\mathbf{x}_{1}^{\prime},\cdots,\mathbf{x}_{n}^{\prime}$ 的平均值为0 。

$Proj(mean(D))=mean{Proj(D)}$

考察 $\mathbf{x}_{1}^{\prime},\cdots,\mathbf{x}_{n}^{\prime}$ 沿 $\mathbf{u}$ 方向的样本方差：
$\begin{aligned} \sigma_{\mathbf{u}}^{2} &=\frac{1}{n} \sum_{i=1}^{n}\left(a_{i}-\mu_{\mathbf{u}}\right)^{2} \\ &=\frac{1}{n} \sum_{i=1}^{n}\left(\mathbf{u}^{T} \mathbf{x}_{i}\right)^{2} \\ &=\frac{1}{n} \sum_{i=1}^{n} \mathbf{u}^{T}\left(\mathbf{x}_{i} \mathbf{x}_{i}^{T}\right) \mathbf{u} \\ &=\mathbf{u}^{T}\left(\frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_{i} \mathbf{x}_{i}^{T}\right) \mathbf{u} \\ &=\mathbf{u}^{T} \mathbf{\Sigma} \mathbf{u} \end{aligned}$
$\mathbf{\Sigma}$ 是样本协方差矩阵。

目标：
$\begin{array}{ll} \max\limits_{\mathbf{u}} & \mathbf{u}^{T} \mathbf{\Sigma} \mathbf{u} \\ \text{s.t} & \mathbf{u}^T\mathbf{u}-1=0 \end{array}$
应用 Lagrangian 乘数法：
$\max \limits_{\mathbf{u}} J(\mathbf{u})=\mathbf{u}^{T} \Sigma \mathbf{u}-\lambda\left(\mathbf{u}^{T} \mathbf{u}-1\right)$
求偏导：
$\begin{aligned} \frac{\partial}{\partial \mathbf{u}} J(\mathbf{u}) &=\mathbf{0} \\ \frac{\partial}{\partial \mathbf{u}}\left(\mathbf{u}^{T} \mathbf{\Sigma} \mathbf{u}-\lambda\left(\mathbf{u}^{T} \mathbf{u}-1\right)\right) &=\mathbf{0} \\ 2 \mathbf{\Sigma} \mathbf{u}-2 \lambda \mathbf{u} &=\mathbf{0} \\ \mathbf{\Sigma} \mathbf{u} &=\lambda \mathbf{u} \end{aligned}$
注意到： $\mathbf{u}^{T} \mathbf{\Sigma} \mathbf{u}=\mathbf{u}^{T} \lambda \mathbf{u}=\lambda$

故优化问题的解 $\lambda$ 选取 $\mathbf{\Sigma}$ 最大特征值， $\mathbf{u}$ 选与 $\lambda$ 相应的单位特征向量。

问题：上述问题使得 $\sigma_{\mathbf{u}}^{2}$ 最大的 $\mathbf{u}$ 能否使投影误差最小？

定义平均平方误差（Minimum Squared Error，MSE）：
$\begin{aligned} M S E(\mathbf{u}) &=\frac{1}{n} \sum_{i=1}^{n}\left\|\mathbf{x}_{i}-\mathbf{x}_{i}^{\prime}\right\|^{2} \\ &=\frac{1}{n} \sum_{i=1}^{n}\left(\mathbf{x}_{i}-\mathbf{x}_{i}^{\prime}\right)^{T}\left(\mathbf{x}_{i}-\mathbf{x}_{i}^{\prime}\right) \\ &=\frac{1}{n} \sum_{i=1}^{n}\left(\left\|\mathbf{x}_{i}\right\|^{2}-2 \mathbf{x}_{i}^{T} \mathbf{x}_{i}^{\prime}+\left(\mathbf{x}_{i}^{\prime}\right)^{T} \mathbf{x}_{i}^{\prime}\right)\\ &=\frac{1}{n} \sum_{i=1}^{n}\left(\left\|\mathbf{x}_{i}\right\|^{2}-2 \mathbf{x}_{i}^{T} (\mathbf{u}^{T} \mathbf{x}_{i})\mathbf{u}+\left[(\mathbf{u}^{T} \mathbf{x}_{i})\mathbf{u}\right]^{T} \left[ (\mathbf{u}^{T} \mathbf{x}_{i})\mathbf{u}\right] \right)\\ &=\frac{1}{n} \sum_{i=1}^{n}\left(\left\|\mathbf{x}_{i}\right\|^{2}-2 (\mathbf{u}^{T} \mathbf{x}_{i})\mathbf{x}_{i}^{T} \mathbf{u}+(\mathbf{u}^{T} \mathbf{x}_{i})(\mathbf{x}_{i}^{T} \mathbf{u})\mathbf{u}^{T}\mathbf{u} \right) \\ &=\frac{1}{n} \sum_{i=1}^{n}\left(\left\|\mathbf{x}_{i}\right\|^{2}-\mathbf{u}^{T} \mathbf{x}_{i}\mathbf{x}_{i}^{T} \mathbf{u} \right) \\ &=\frac{1}{n} \sum_{i=1}^{n}\left\|\mathbf{x}_{i}\right\|^{2}-\mathbf{u}^{T} \mathbf{\Sigma} \mathbf{u}\\ &= var(D)-\sigma_{\mathbf{u}}^{2} \end{aligned}$
上式表明： $var(D)=\sigma_{\mathbf{u}}^{2}+MSE$

$\mathbf{u}$ 的几何意义： $\mathbb{R}^d$ 中使得数据沿其方向投影后方差最大的同时，MSE 最小的直线方向。

$\mathbf{u}$ 被称为一阶主元（first principal component）

7.2.2 最佳2-维近似

（二阶主元分析：r=2）

假设 $\mathbf{u}_1$ 已经找到，即 $\mathbf{\Sigma}$ 的最大特征值对应的特征向量。

目标：寻找 $\mathbf{u}_2$ ，简记为 $\mathbf{v}$ ，使得： $\mathbf{v}^{T} \mathbf{u}_{1}=0,\mathbf{v}^{T} \mathbf{v} =1$

考虑 $\mathbf{x}_{i}$ 沿 $\mathbf{v}$ 方向投影的方差：
$\begin{array}{ll} \max\limits_{\mathbf{u}} & \sigma_{\mathbf{v}}^{2} = \mathbf{v}^{T} \mathbf{\Sigma} \mathbf{v} \\ \text{s.t} & \mathbf{v}^T\mathbf{v}-1=0\\ & \mathbf{v}^{T} \mathbf{u}_{1}=0 \end{array}$
定义： $J(\mathbf{v})=\mathbf{v}^{T} \mathbf{\Sigma} \mathbf{v}-\alpha\left(\mathbf{v}^{T} \mathbf{v}-1\right)-\beta\left(\mathbf{v}^{T} \mathbf{u}_{1}-0\right)$

对 $\mathbf{v}$ 求偏导得：
$\Sigma \mathbf{v}-2 \alpha \mathbf{v}-\beta \mathbf{u}_{1}=\mathbf{0}$
两边同乘 $\mathbf{u}_{1}^{T}$ ：
$\begin{aligned} 2 \mathbf{u}_{1}^{T}\Sigma \mathbf{v}-2 \alpha \mathbf{u}_{1}^{T}\mathbf{v}-\beta \mathbf{u}_{1}^{T}\mathbf{u}_{1} &=0 \\ 2 \mathbf{u}_{1}^{T}\Sigma \mathbf{v}-\beta &= 0\\ 2 \mathbf{v}^{T}\Sigma \mathbf{u}_{1}-\beta &= 0\\ 2 \mathbf{v}^{T}\lambda_1 \mathbf{u}_{1}-\beta &= 0\\ \beta &= 0 \end{aligned}$
再代入到原式：
$\Sigma \mathbf{v}-2 \alpha \mathbf{v}=\mathbf{0}\\ \Sigma \mathbf{v}=\alpha \mathbf{v}$
故 $\mathbf{v}$ 也是 $\mathbf{\Sigma}$ 的特征向量。

$\sigma_{\mathbf{v}}^{2} = \mathbf{v}^{T} \mathbf{\Sigma} \mathbf{v} =\alpha$ ，故 $\alpha$ 应取 $\mathbf{\Sigma}$ （第二大）的特征向量。

问题1：上述求得的 $\mathbf{v}$ （即 $\mathbf{u}_2$ ），与 $\mathbf{u}_1$ 一起考虑，能否使 $D$ 在 $span\{\mathbf{u}_1, \mathbf{u}_2 \}$ 上投影总方差最大？

设 $\mathbf{x}_i=\underbrace{a_{i1} \mathbf{u}_1+a_{i2}\mathbf{u}_2}_{投影}+\cdots$

则 $\mathbf{x}_i$ 在 $span\{\mathbf{u}_1, \mathbf{u}_2 \}$ 上投影坐标： $\mathbf{a}_{i}=(a_{i1},a_{i2})^T=(\mathbf{u}_1^{T}\mathbf{x}_i,\mathbf{u}_2^{T}\mathbf{x}_i)^{T}$

令 $\mathbf{U}_{2}=\left(\begin{array}{cc} \mid & \mid \\ \mathbf{u}_{1} & \mathbf{u}_{2} \\ \mid & \mid \end{array}\right)$ ，则 $\mathbf{a}_{i}=\mathbf{U}_{2}^{T} \mathbf{x}_{i}$

投影总方差为：
$\begin{aligned} \operatorname{var}(\mathbf{A}) &=\frac{1}{n} \sum_{i=1}^{n}\left\|\mathbf{a}_{i}-\mathbf{0}\right\|^{2} \\ &=\frac{1}{n} \sum_{i=1}^{n}\left(\mathbf{U}_{2}^{T} \mathbf{x}_{i}\right)^{T}\left(\mathbf{U}_{2}^{T} \mathbf{x}_{i}\right) \\ &=\frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_{i}^{T}\left(\mathbf{U}_{2} \mathbf{U}_{2}^{T}\right) \mathbf{x}_{i}\\ &=\frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_{i}^{T}\left( \mathbf{u}_{1}\mathbf{u}_{1}^T + \mathbf{u}_{2}\mathbf{u}_{2}^T \right) \mathbf{x}_{i}\\ &=\mathbf{u}_{1}^T\mathbf{\Sigma} \mathbf{u}_{1} + \mathbf{u}_{2}^T\mathbf{\Sigma} \mathbf{u}_{2}\\ &= \lambda_1 +\lambda_2 \end{aligned}$
问题2：平均平方误差是否最小？

其中， $\mathbf{x}_{i}^{\prime}=\mathbf{U}_{2}\mathbf{U}_{2}^{T} \mathbf{x}_{i}$
$\begin{aligned} M S E &= \frac{1}{n} \sum_{i=1}^{n}\left\|\mathbf{x}_{i}-\mathbf{x}_{i}^{\prime}\right\|^{2} \\ &= \frac{1}{n} \sum_{i=1}^{n}\left\|\mathbf{x}_{i}\right\|^{2} - \frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_{i}^{T}\left(\mathbf{U}_{2} \mathbf{U}_{2}^{T}\right) \mathbf{x}_{i}\\ &= var(D) - \lambda_1 - \lambda_2 \end{aligned}$
结论：

$\mathbf{\Sigma}$ 的前 $r$ 个特征值的和 $\lambda_1+\cdots+\lambda_r(\lambda_1\ge\cdots\ge\lambda_r)$ 给出最大投影总方差；
$var(D)-\sum\limits_{i=1}^r \lambda_i$ 给出最小MSE；
$\lambda_1,\cdots,\lambda_r$ 相应的特征向量 $\mathbf{u}_{1},\cdots\mathbf{u}_{r}$ 张成 $r$ - 阶主元。

7.2.3 推广

$\Sigma_{d\times d}$ ， $\lambda_1 \ge \lambda_2 \ge \cdots \lambda_d$ ，中心化

$\sum\limits_{i=1}^r\lambda_i$ ：最大投影总方差；

$var(D)-\sum\limits_{i=1}^r\lambda_i$ ：最小MSE

实践： 如何选取适当的 $r$ ，考虑比值 $\frac{\sum\limits_{i=1}^r\lambda_i}{var(D)}$ 与给定阈值 $\alpha$ 比较

算法 7.1 PCA：

输入： $D$ ， $\alpha$

输出： $A$ (降维后)

$\boldsymbol{\mu} = \frac{1}{n}\sum\limits_{i=1}^r\mathbf{x}_i$ ;
$\mathbf{Z}=\mathbf{D}-\mathbf{1}\cdot \boldsymbol{\mu} ^T$ ;
$\mathbf{\Sigma}=\frac{1}{n}(\mathbf{Z}^T\mathbf{Z})$ ;
$\lambda_1 \ge \lambda_2 \ge \cdots \lambda_d$ ， $\longleftarrow \mathbf{\Sigma}$ 的特征值（降序排列）;
$\mathbf{u}_1,\mathbf{u}_2,\cdots,\mathbf{u}_d$ ， $\longleftarrow \mathbf{\Sigma}$ 的特征向量（单位正交）；
计算 $\frac{\sum\limits_{i=1}^r\lambda_i}{var(D)}$ ，选取其比值超过 $\alpha$ 最小的 $r$ ；
$\mathbf{U}_r=(\mathbf{u}_1,\mathbf{u}_2,\cdots,\mathbf{u}_r)$ ;
$A=\{\mathbf{a}_i|\mathbf{a}_i=\mathbf{U}_r^T\mathbf{x}_i, i=1,\cdots,n\}$ 。

7.3 Kernel PCA：核主元分析

$\phi:\mathcal{I}\to \mathcal{F}\subseteq \mathbb{R}^d$

$K:\mathcal{I}\times\mathcal{I}\to \mathbb{R}$

$K(\mathbf{x}_i,\mathbf{x}_j)=\phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j)$

已知： $\mathbf{K}=[K(\mathbf{x}_i,\mathbf{x}_j)]_{n\times n}$ ， $\mathbf{\Sigma}_{\phi}=\frac{1}{n}\sum\limits_{i=1}^n\phi(\mathbf{x}_i)\phi(\mathbf{x}_i)^T$

对象： $\phi(\mathbf{x}_1),\phi(\mathbf{x}_2),\cdots,\phi(\mathbf{x}_n)\in \mathbb{R}^d$ ，假设 $\frac{1}{n}\sum\limits_{i}^{n}\phi(\mathbf{x}_i)=\mathbf{0}$ ， $\mathbf{K} \to \hat{\mathbf{K}}$ ，已经中心化；

目标： $\mathbf{u},\lambda,s.t. \mathbf{\Sigma}_{\phi}\mathbf{u}=\lambda\mathbf{u}$
$\begin{aligned} \frac{1}{n}\sum\limits_{i=1}^n\phi(\mathbf{x}_i)[\phi(\mathbf{x}_i)^T\mathbf{u}] &=\lambda\mathbf{u}\\ \sum\limits_{i=1}^n[\frac{\phi(\mathbf{x}_i)^T\mathbf{u}}{n\lambda}] \phi(\mathbf{x}_i)&=\mathbf{u}\\ \end{aligned}$
相同于所有数据线性组合。

令： $c_i=\frac{\phi(\mathbf{x}_i)^T\mathbf{u}}{n\lambda}$ ，则 $\mathbf{u}=\sum\limits_{i=1}^nc_i \phi(\mathbf{x}_i)$ 。代入原式：
$\begin{aligned} \left(\frac{1}{n} \sum_{i=1}^{n} \phi\left(\mathbf{x}_{i}\right) \phi\left(\mathbf{x}_{i}\right)^{T}\right)\left(\sum_{j=1}^{n} c_{j} \phi\left(\mathbf{x}_{j}\right)\right) &=\lambda \sum_{i=1}^{n} c_{i} \phi\left(\mathbf{x}_{i}\right) \\ \frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{n} c_{j} \phi\left(\mathbf{x}_{i}\right) \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right) &=\lambda \sum_{i=1}^{n} c_{i} \phi\left(\mathbf{x}_{i}\right) \\ \sum_{i=1}^{n}\left(\phi\left(\mathbf{x}_{i}\right) \sum_{j=1}^{n} c_{j} K(\mathbf{x}_i, \mathbf{x}_j) \right) &=n \lambda \sum_{i=1}^{n} c_{i} \phi\left(\mathbf{x}_{i}\right) \end{aligned}$
注意，此处 $\mathbf{K}=\hat{\mathbf{K}}$ 已经中心化

对于 $\forall k (1\le k\le n)$ ，两边同时左乘 $\phi(\mathbf{x}_{k})$ ：
$\begin{aligned} \sum_{i=1}^{n}\left(\phi^T(\mathbf{x}_{k}) \phi\left(\mathbf{x}_{i}\right) \sum_{j=1}^{n} c_{j} K(\mathbf{x}_i, \mathbf{x}_j) \right) &=n \lambda \sum_{i=1}^{n} c_{i} \phi^T(\mathbf{x}_{k}) \phi\left(\mathbf{x}_{i}\right) \\ \sum_{i=1}^{n}\left(K(\mathbf{x}_k, \mathbf{x}_i) \sum_{j=1}^{n} c_{j} K(\mathbf{x}_i, \mathbf{x}_j) \right) &=n \lambda \sum_{i=1}^{n} c_{i} K(\mathbf{x}_k, \mathbf{x}_i) \\ \end{aligned}$
令 $\mathbf{K}_{i}=\left(K\left(\mathbf{x}_{i}, \mathbf{x}_{1}\right), K\left(\mathbf{x}_{i}, \mathbf{x}_{2}\right), \cdots, K\left(\mathbf{x}_{i}, \mathbf{x}_{n}\right)\right)^{T}$ (核矩阵的第 $i$ 行， $\mathbf{K}=(\begin{bmatrix} \mathbf{K}_1^T \\ \vdots \\ \mathbf{K}_n^T \end{bmatrix})$ )， $\mathbf{c}=(c_1,c_2,\cdots,c_n)^T$ ，则：
$\begin{aligned} \sum_{i=1}^{n}K(\mathbf{x}_k, \mathbf{x}_i) \mathbf{K}^T_i\mathbf{c} &=n \lambda \mathbf{K}^T_k\mathbf{c},k=1,2,\cdots,n \\ \mathbf{K}^T_k\begin{bmatrix} \mathbf{K}_1^T \\ \vdots \\ \mathbf{K}_n^T \end{bmatrix}\mathbf{c} &=n \lambda \mathbf{K}^T_k\mathbf{c}\\ \mathbf{K}^T_k\mathbf{K} &=n \lambda \mathbf{K}^T_k\mathbf{c} \end{aligned}$
即 $\mathbf{K}^2\mathbf{c}=n\lambda \mathbf{K}\mathbf{c}$

假设 $\mathbf{K}^{-1}$ 存在
$\begin{aligned} \mathbf{K}^2\mathbf{c}&=n\lambda \mathbf{K}\mathbf{c}\\ \mathbf{K}\mathbf{c}&=n\lambda \mathbf{c}\\ \mathbf{K}\mathbf{c}&= \eta\mathbf{c},\eta=n\lambda \end{aligned}$
结论： $\frac{\eta_1}{n}\ge\frac{\eta_2}{n}\ge\cdots\ge\frac{\eta_n}{n}$ ，给出在特征空间中 $\phi(\mathbf{x}_1),\phi(\mathbf{x}_2),\cdots,\phi(\mathbf{x}_n)$ 的投影方差： $\sum\limits_{i=1}^{r}\frac{\eta_r}{n}$ ，其中 $\eta_1\ge\eta_2\cdots\ge\eta_n$ 是 $\mathbf{K}$ 的特征值。

问：可否计算出 $\phi(\mathbf{x}_1),\phi(\mathbf{x}_2),\cdots,\phi(\mathbf{x}_n)$ 在主元方向上的投影（即降维之后的数据）？

设 $\mathbf{u}_1,\cdots,\mathbf{u}_d$ 是 $\mathbf{\Sigma}_{\phi}$ 的特征向量，则 $\phi(\mathbf{x}_j)=a_1\mathbf{u}_1+\cdots+a_d\mathbf{u}_d$ ，其中
$\begin{aligned} a_k &= \phi(\mathbf{x}_j)^T\mathbf{u}_k, k=1,2,\cdots,d\\ &= \phi(\mathbf{x}_j)^T\sum\limits_{i=1}^nc_{ki} \phi(\mathbf{x}_i)\\ &= \sum\limits_{i=1}^nc_{ki} \phi(\mathbf{x}_j)^T\phi(\mathbf{x}_i)\\ &= \sum\limits_{i=1}^nc_{ki} K(\mathbf{x}_j,\mathbf{x}_i) \end{aligned}$

算法7.2：核主元分析（ $\mathcal{F}\subseteq \mathbb{R}^d$ ）

输入： $K$ ， $\alpha$

输出： $A$ （降维后数据的投影坐标）

$\hat{\mathbf{K}} :=\left(\mathbf{I}-\frac{1}{n} \mathbf{1}_{n \times n}\right) \mathbf{K}\left(\mathbf{I}-\frac{1}{n} \mathbf{1}_{n \times n}\right)$
$\eta_1,\eta_2,\cdots\eta_d$ $\longleftarrow \mathbf{K}$ 的特征值，只取前 $d$ 个
$\mathbf{c}_1,\mathbf{c}_2,\cdots,\mathbf{c}_d$ $\longleftarrow \mathbf{K}$ 的特征向量（单位化，正交）
$\mathbf{c}_i \leftarrow \frac{1}{\sqrt{\eta_i}}\cdot \mathbf{c}_i,i=1,\cdots,d$
选取最小的 $r$ 使得： $\frac{\sum\limits_{i=1}^r\frac{\eta_i}{n}}{\sum\limits_{i=1}^d\frac{\eta_i}{n}}\ge \alpha$
$\mathbf{C}_r=(\mathbf{c}_1,\mathbf{c}_2,\cdots,\mathbf{c}_r)$
$A=\{\mathbf{a}_i|\mathbf{a}_i=\mathbf{C}_r^T\mathbf{K}_i, i=1,\cdots,n\}$