【机器学习】7 支持向量机

最新推荐文章于 2024-10-06 15:57:40 发布

社恐患者

最新推荐文章于 2024-10-06 15:57:40 发布

阅读量230

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/qq_44714521/article/details/108508049

版权

机器学习专栏收录该内容

15 篇文章 1 订阅

订阅专栏

第7章支持向量机（大间距分类器，Support Vector Machine）

1 Optimization Objective 优化目标
2 Large Margin Intution
3 数学原理
4 Kernels 核函数
7 Using an SVM
9 Reference

1 Optimization Objective 优化目标

Hypothesis:
$h_\theta(x)=\begin{cases} 1&\text{, if $\theta^Tx≥0$}\\ 0&\text{, otherwise} \end{cases}$
Cost Function:
$J(\theta)=C\sum_{i=1}^m\left[y^{(i)}{cost}_{1}(\theta^Tx^{(i)})+(1-y^{(i)}){cost}_0(\theta^Tx^{(i)})\right]+\frac{1}{2}\sum_{j=1}^n{\theta_j}^2$
$C类似于\frac{1}{\lambda}$
Goal:
$\mathop{\text{minimize}}\limits_{\theta} J(\theta)$

2 Large Margin Intution

cost(z)

$\begin{aligned} &\text{if $y^{(i)}=1$, we want $\theta^Tx≥1$（not just $≥0$）}\\ &\text{if $y^{(i)}=0$, we want $\theta^Tx≤-1$（not just $≤0$）} \end{aligned}$

3 数学原理

$u=\left[\begin{matrix} u_1\\u_2\end{matrix}\right]，v=\left[\begin{matrix} v_1\\v_2\end{matrix}\right]$

$u^Tv=p\ \cdot ||u||=u_1v_1+u_2v_2$
$p$ ：length of projection of $v$ onto $u （有符号）$
$u^Tv=v^Tu$
$||u||=\sqrt{{u_1}^2+{u_2}^2}$
$\theta_0=0$ ：意味着决策边界必须通过原点(0,0)
支持向量机就是极小化 $||\theta||$
$\begin{aligned} &\mathop{\text{minimize}}\limits_{\theta} \frac{1}{2}\sum_{j=1}^n{\theta_j}^2\\ \text{s.t.} \ \ &p^{(i)}\ \cdot ||\theta||≥1&\text{if $y^{(i)}=1$}\\ \ \ &p^{(i)}\ \cdot ||\theta||≤-1&\text{if $y^{(i)}=0$} \end{aligned}$
要使投影足够大，才能有大间距

4 Kernels 核函数

$h_\theta(x)=\theta_1f_1+\theta_2f_2+···+\theta_nf_n$
Given $x$ , compute new features $f$ depending on proximity to landmarks $l$
Kernel： $f_i=similarity(x,l^{(i)})$
Choose the landmarks： $l^{(1)}=x^{(1)}, l^{(2)}=x^{(2)}, ···, l^{(m)}=x^{(m)}$
$f^{(i)}=\left[\begin{matrix} f_0^{(i)}=1\\ f_1^{(i)}=sim(x^{(i)},l^{(1)})\\ f_2^{(i)}=sim(x^{(i)},l^{(2)})\\ \vdots\\ f_i^{(i)}=sim(x^{(i)},l^{(i)})=e^0=1\\ \vdots\\ f_1^{(m)}=sim(x^{(i)},l^{(m)}) \end{matrix}\right]$

4.1 Gaussian Kernel

$f_i=similarity(x,l^{(i)})=exp\left(-\frac{{||x-l^{(i)}||}^2}{2\sigma^2}\right)=exp\left(-\frac{\sum_{j=1}^n{(x_j-l_j^{(i)})}^2}{2\sigma^2}\right)$
与正态分布没啥联系
if $x≈l^{(i)}$ , $f_i≈1$
if $x$ is far from $l^{(i)}$ , $f_i≈0$
使用前需要进行特征缩放
需要选择 $\sigma^2$

4.2 Linear Kernel

do not use kernels
Predict “ $y = 1$ ” if $\theta^Tf≥0$

5 SVM with Kernels

Hypothesis：Given $x$ , compute features $f∈\mathbb{R}^{m+1}$ .
Training：
$\mathop{\text{min}}\limits_{\theta}C\sum_{i=1}^m\left[y^{(i)}{cost}_{1}(\theta^Tf^{(i)})+(1-y^{(i)}){cost}_0(\theta^Tf^{(i)})\right]+\theta^TM\theta$
其中， $M$ 是根据我们选择的核函数而不同的一个矩阵

6 参数 $C$ 和 $\sigma^2$ 的影响

$C$ 较大时，相当于 $\lambda$ 较小，可能会导致过拟合，高方差
$C$ 较小时，相当于 $\lambda$ 较大，可能会导致欠拟合，高偏差
$\sigma^2$ 较大时，可能会导致低方差，高偏差（ $f_i$ very more smoothly）
$\sigma^2$ 较大时，可能会导致低偏差，高方差（ $f_i$ very less smoothly）

7 Using an SVM

Use SVM software package to solve for parameters $\theta$
Need to specify:
(1) Choice of parameter $C$
(2) Choice of kernel ( similarity function )

7.1 Other Choices of Kernel

Not all similarity function make valid kernels
Need to satisfy technical condition called “Mercer’s Theorem” to make sure SVM packages’ optimizations run correctly, and do not diverge
Many off-the-shelf kernels：
(1) Polynomial kernel（多项式核函数）
(2) String kernel（字符串核函数）
(3) chi-square kernel（卡方核函数）
(4) histogram intersection kernel（直方图交集核函数）

7.2 Multi-class classification

-Train $K$ SVMs, one to distinguish $y = i$ from the rest, for $i = 1, 2, \cdot \cdot \cdot, K$ , get $\theta^{(1)}, \theta^{(2)}, ···, \theta^{(K)}$ . Pick class $i$ with largest ${(\theta^{(i)})}^Tx$

Or use packages already

8 逻辑回归和支持向量机

$n$ ：特征数
$m$ ：训练样本数

$n 、 m$	选择
$n > > m$	逻辑回归、不带核函数的支持向量机
$n$ 较小， $m$ 中等	高斯核函数的支持向量机
$n$ 较小， $m$ 较大	创造、增加更多的特征，再逻辑回归、不带核函数的支持向量机