支持向量机和核函数

最新推荐文章于 2023-11-14 11:07:07 发布

蓬某某

最新推荐文章于 2023-11-14 11:07:07 发布

阅读量522

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/wang_yunpeng/article/details/104897590

版权

机器学习专栏收录该内容

10 篇文章 1 订阅

订阅专栏

1. 支持向量机

1.1. 从logistic回归到支持向量机

logistic回归模型：
$\min_{\theta} \frac{1}{m}\left [ \sum_{i=1}^{m}y^{(i)}(-logh_{\theta}(x^{(i)}))+(1-y^{(i)})(-log(1-h_{\theta}(x^{(i)})))\right ]+\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_j^2$
其中 $-logh(x)=-log\frac{1}{1+e^{-x}}$ 的图形如下：
-logh
$-log(1-h(x))=-log(1-\frac{1}{1+e^{-x}})$ 的图形如下：
-log1-h
要使logistic回归误差最小，则：
当 $y = 1$ 时， $\theta^Tx^{(i)} \ge 0$ ；
当 $y = 0$ 时， $\theta^Tx^{(i)} \le 0$ ；
这里支持向量机要求更加严格，要使支持向量机误差最小，则：
当 $y = 1$ 时， $\theta^Tx^{(i)} \ge 1$ ；
当 $y = 0$ 时， $\theta^Tx^{(i)}\le -1$ ；
此时支持向量机的误差函数为：
$\min_{\theta} C\left [ \sum_{i=1}^{m}y^{(i)}cost_1(\theta^Tx^{(i)})+(1-y^{(i)})cost_0(\theta^Tx^{(i)})\right ]+\frac{1}{2}\sum_{j=1}^{n}\theta_j^2$
其中:
$C=\frac{1}{\lambda} \\ cost_1(\theta^Tx^{(i)})= \begin{cases} 0, &if\ \theta^Tx^{(i)} \ge 1\\ -a_1\theta^Tx^{(i)}+b_1, &otherwise \end{cases} \\ cost_0(\theta^Tx^{(i)})= \begin{cases} 0, &if\ \theta^Tx^{(i)} \le -1\\ a_0\theta^Tx^{(i)}+b_0, &otherwise \end{cases} \\ (a_1>0, b_1>0, a_0>0, b_0>0)$
costimg
如若能找到一系列 $\theta$ ，使得：
当 $y = 1$ 时， $\theta^Tx^{(i)} \ge 1$ ；
当 $y = 0$ 时， $\theta^Tx^{(i)} \le -1$ ；
则代价函数简化为：
$J(\theta)=\frac{1}{2}\sum_{j=1}^{n}\theta_j^2 \\ s.t. \begin{cases} \theta^Tx^{(i)} \ge 1, &if\ y^{(i)} = 1\\ \theta^Tx^{(i)} \le -1, &if\ y^{(i)} = 0 \end{cases} \\$
令负样本的取值为-1，正样本的取值为+1，则支持向量机的参数表达式为：
$\begin{aligned} \min_{\theta}\ \ \ \ &\frac{1}{2}||\theta||^2 \\ s.t. \ \ \ &y^{(i)}\theta^Tx^{(i)} \ge 1, i=1,2,...,n \end{aligned}$

1.2. 线性可分支持向量

假设能找到一个超平面，把样本中的所有正样本和负样本区分开来，这样的样本就是线性可分的。
svm1
存在一条唯一的超平面，使得各点到该超平面的最小距离最大。
假设该超平面为：
$\theta^Tx+b=0$
则各点到该超平面的距离为：
$\gamma_i = y^{(i)}\frac{\theta^Tx^{(i)}+b}{||\theta||}$
乘上 $y^{(i)}$ 保证 $\gamma_i$ 为正值。
令最小距离为 $\gamma$ ，则：
$\gamma = min(\gamma_i)=y^{(k)}\frac{\theta^Tx^{(k)}+b}{||\theta||}$
所以该线性可分向量机参数方程为：
$\begin{aligned} \max_{\theta,b}\ \ \ \ &\gamma \\ s.t. \ \ \ \ &y^{(i)}\frac{\theta^Tx^{(i)}+b}{||\theta||} \ge \gamma, i=1,2,...,n \end{aligned}$
等价于：
$\begin{aligned} \max_{\theta,b}\ \ \ \ &K\frac{1}{||\theta||} \\ s.t. \ \ \ \ &\frac{1}{||\theta||\gamma}y^{(i)}(\theta^Tx^{(i)}+b) \ge 1, i=1,2,...,n \end{aligned}$
其中 $K=y^{(k)}(\theta^Tx^{(k)}+b)$ 为常数，将 $\theta^T$ 和 $b$ 同时缩小 $||\theta||\gamma$ 倍，作为新的 $\theta^T$ 和 $b$ 。则原式等价于：
$\begin{aligned} \min_{\theta}\ \ \ \ &\frac{1}{2}||\theta||^2 \\ s.t. \ \ \ \ &y^{(i)}(\theta^Tx^{(i)}+b) \ge 1, i=1,2,...,n \end{aligned}$
得到和1.1相同的表达式，这就是线性可分支持向量的满足的约束表达式。

2. 核函数

上述表达式只适用于样本为线性可分的场景，但对于线性不可分的场景就不能适用，这里提出核函数的概念，把线性不可分的样本转变成线性可分的样本。
高斯核函数(Gaussian kernel)
$f(x)=e^{-\frac{(x-l)^2}{2\delta^2}}$
$l$ 为标记点，当 $x$ 与 $l$ 相距很近时， $\to 1$ ；当 $x$ 与 $l$ 相距很远时， $\to 0$ ； $\delta$ 越大，曲线下降越慢。
我们将样本输入的每一个点作为一个标记，则有 $f_1, f_2,...f_m$ 个核函数。则：
$\begin{aligned} \min_{\theta}\ \ \ \ &\frac{1}{2}||\theta||^2 \\ s.t. \ \ \ \ &y^{(i)}(\theta^Tf(x^{(i)})+b) \ge 1, i=1,2,...,n \end{aligned}$
其中：
$\theta = [\theta_1\ \theta_2\ ...\ \theta_m]^T\\ f(x^{(i)})=[e^{-\frac{(x^{(i)}-x^{(1)})^2}{2\delta^2}}\ e^{-\frac{(x^{(i)}-x^{(2)})^2}{2\delta^2}}\ ...\ e^{-\frac{(x^{(i)}-x^{(m)})^2}{2\delta^2}}]^T$

3. 支持向量机选择时机

logistic回归和不带核函数的支持向量机本质上基本一样，何时使用logistic回归和支持向量机，可通过特征量 $n$ 和样本量 $m$ 来决定。

$n > > m$ ，即训练集数据量不够支持我们训练一个复杂的非线性模型，选用logistic回归模型或者不带核函数的支持向量机。
如果𝑛较小，而且𝑚大小中等， $n\in(1, 1000), m\in(10, 10000)$ ，使用高斯核函数的支持向量机。
如果𝑛较小，而𝑚较大， $n\in(1, 1000), m > 50000$ ，则使用支持向量机会非常慢。可以增加更多的特征，然后使用logistic回归或不带核函数的支持向量机。