机器学习基石之非线性转换（Nonlinear Transformation）

最新推荐文章于 2021-06-03 23:17:19 发布

FlameAlpha

最新推荐文章于 2021-06-03 23:17:19 发布

阅读量2.8k

点赞数 2

分类专栏：机器学习 # 机器学习基石文章标签：机器学习

本文链接：https://blog.csdn.net/Flame_alone/article/details/105725186

版权

机器学习同时被 2 个专栏收录

32 篇文章

订阅专栏

机器学习基石

8 篇文章

订阅专栏

本文探讨了在面对非线性可分数据时，如何通过非线性转换将原始特征空间映射到高维空间，使数据变得线性可分。介绍了基本的非线性转换方法，如二次假设集和多项式假设集，并讨论了非线性转换的代价，即随着转换复杂度增加，模型的VC维度和训练误差的变化。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

非线性转换（Nonlinear Transformation）

前面讲了许多线性模型，但是假如数据并不是线性可分的，该如何处理呢？基本思路是将数据样本（特征）空间 $\mathcal{X}$ 映射到 $\mathcal{Z}$ 空间后，在 $\mathcal{Z}$ 空间数据是线性可分的话，便可以在 $\mathcal{Z}$ 空间上使用线性模型对数据分析。

那么该映射叫做非线性特征转换 $\Phi$ （(nonlinear) feature transform ）实现的是：
$\mathbf { x } \in \mathcal { X } {\mathop \longmapsto ^ \mathbf { \Phi }} \mathbf { z } \in \mathcal { Z }$
学习的基本步骤如下：

transform original data $\left\{ \left( \mathbf { x } _ { n } , y _ { n } \right) \right\}$ to $\left\{ \left( \mathbf { z } _ { n } = \mathbf { \Phi } \left( \mathbf { x } _ { n } \right) , y _ { n } \right) \right\}$
get a good perceptron $\tilde { \mathbf { w } }$ using $\left\{ \left( \mathbf { z } _ { n } = \mathbf { \Phi } \left( \mathbf { x } _ { n } \right) , y _ { n } \right) \right\}$ and your favorite linear classification algorithm $\mathcal{A}$ 。
return $\mathbf { x } ) = \operatorname { sign } \left( \tilde { \mathbf { w } } ^ { T } \mathbf { \Phi } ( \mathbf { x } ) \right)$

常用的非线性转换（General Nonlinear Transform）

General Quadratic Hypothesis Set

基本形式为：
$\Phi _ { 2 } ( \mathbf { x } ) = \left( 1 , x _ { 1 } , x _ { 2 } , x _ { 1 } ^ { 2 } , x _ { 1 } x _ { 2 } , x _ { 2 } ^ { 2 } \right)$
其具有的特性是

can implement all possible quadratic curve boundaries: circle, ellipse, rotated ellipse, hyperbola, parabola, …
适用于各种二次曲线边界：圆，椭圆，旋转椭圆，双曲线，抛物线…
include lines and constants as degenerate cases
也包括直线型和常数型

General PolynomialHypothesis Set

基本形式为：
$\begin{aligned} \Phi _ { 0 } ( \mathbf { x } ) = ( 1 ) , \Phi _ { 1 } ( \mathbf { x } ) & = \left( \Phi _ { 0 } ( \mathbf { x } ) , \quad x _ { 1 } , x _ { 2 } , \ldots , x _ { d } \right) \\ \Phi _ { 2 } ( \mathbf { x } ) & = \left( \Phi _ { 1 } ( \mathbf { x } ) , \quad x _ { 1 } ^ { 2 } , x _ { 1 } x _ { 2 } , \ldots , x _ { d } ^ { 2 } \right) \\ \Phi _ { 3 } ( \mathbf { x } ) & = \left( \Phi _ { 2 } ( \mathbf { x } ) , \quad x _ { 1 } ^ { 3 } , x _ { 1 } ^ { 2 } x _ { 2 } , \ldots , x _ { d } ^ { 3 } \right)\\ \Phi _ { Q } ( \mathbf { x } ) &= \left( \begin{array} { c c } \Phi _ { Q - 1 } ( \mathbf { x } ) , & \left. x _ { 1 } ^ { Q } , x _ { 1 } ^ { Q - 1 } x _ { 2 } , \ldots , x _ { d } ^ { Q } \right) \end{array} \right.\end{aligned}$
那么在经过特征转换后的 hypothesis set 可以表示为
$\begin{array} { c c c c c c c c c } \mathcal { H } _ { \Phi _ { 0 } } & \subset & \mathcal { H } _ { \Phi _ { 1 } } & \subset & \mathcal { H } _ { \Phi _ { 2 } } & \subset & \mathcal { H } _ { \Phi _ { 3 } } & \subset & \ldots & \subset & \mathcal { H } _ { \Phi _ { Q } } \\ \| & & \| & & \| & & \| & & & &\| \\ \mathcal { H } _ { 0 } & & \mathcal { H } _ { 1 } & & \mathcal { H } _ { 2 } & & \mathcal { H } _ { 3 } & & \ldots & & \mathcal { H } _ { Q } \end{array}$
可以绘制出结构图：

所以其结构叫做嵌套（nested） $\mathcal { H } _ { i }$ 。

非线性转换代价（Price）

对于多项式非线性转换来说，求取 $\operatorname { argmin } _ { h \in \mathcal { H } _ { i } } E _ { \mathrm { in } } ( h )$ ，可以获得以下结果：
$\begin{array} { c c c c c c c c c} \mathcal { H } _ { 0 } & \subset & \mathcal { H } _ { 1 } & \subset & \mathcal { H } _ { 2 } & \subset & \mathcal { H } _ { 3 } & \subset & \cdots \\ d _ { \mathrm { VC } } \left( \mathcal { H } _ { 0 } \right) & \leq & d _ { \mathrm { VC } } \left( \mathcal { H } _ { 1 } \right) & \leq & d _ { \mathrm { VC } } \left( \mathcal { H } _ { 2 } \right) & \leq & d _ { \mathrm { VC } } \left( \mathcal { H } _ { 3 } \right) & \leq & \cdots \\ E _ { \mathrm { in } } \left( g _ { 0 } \right) & \geq & E _ { \mathrm { in } } \left( g _ { 1 } \right) & \geq & E _ { \mathrm { in } } \left( g _ { 2 } \right) & \geq & E _ { \mathrm { in } } \left( g _ { 3 } \right) & \geq & \cdots \end{array}$

根据之前推导的公式可知： $\underbrace { 1 } _ { W _ { 0 } } + \underbrace { \tilde { d } } _ { \text {others } } \text { dimensions } = O \left( Q ^ { d } \right)$ ，所以 $Q$ large 意味着 large $d_{\mathbf{vc}}$ 。即能力越来越大，复杂度会随之不断增加。