10-SVM

最新推荐文章于 2024-06-13 18:42:21 发布

DawnRanger

最新推荐文章于 2024-06-13 18:42:21 发布

阅读量897

点赞数

分类专栏： machine-learning

本文链接：https://blog.csdn.net/DawnRanger/article/details/48168855

版权

machine-learning 专栏收录该内容

22 篇文章 1 订阅

订阅专栏

1 - Optimization Objective 优化目标

1.1 从logistic regression来看

首先来看Logistic regression的hypothesis函数： $h_\theta(x)=\dfrac{1}{1+e^{-\theta^Tx}}$ ,可以知道：

当y=1时，我们希望 $h_\theta(x)\approx1$ ,那么此时必须有： $\theta^Tx\geq0$
当y=0时，我们希望 $h_\theta(x)\approx0$ ,那么此时必须有： $\theta^Tx\leq0$
误差公式：
$= - y l o g h θ (x) - (1 - y) l o g (1 - h θ (x)) - y l o g 1 1 + e - θ T x - (1 - y) l o g (1 - 1 1 + e - θ T x)$ $\begin{aligned} &-ylogh_\theta(x)-(1-y)log(1-h_\theta(x)) \\ =&-ylog\frac{1}{1+e^{-\theta^Tx}}-(1-y)log(1-\frac{1}{1+e^{-\theta^Tx}}) \end{aligned}$
用图来表示 $-log\dfrac{1}{1+e^{-z}}$ 和 $-log(1-\dfrac{1}{1+e^{-z}})$ ：
令图中品红色的曲线分别为 $Cost_1(z)$ 和 $Cost_0(z)$ ，我们将在SVM中使用这两个函数衡量误差。

1.2 优化目标

Logistc regression:

$m i n θ 1 m [\sum i = 1 m y (i) (- l o g h θ (x (i))) + (1 - y (i)) ((- l o g (1 - h θ (x (i)))))] + λ 2 m \sum j = 1 n θ 2 j$ $min_\theta\frac{1}{m}\bigg[ \sum\limits_{i=1}^my^{(i)}\Big(-logh_\theta(x^{(i)})\Big)+ (1-y^{(i)})\Big((-log(1-h_\theta(x^{(i)})))\Big)\bigg] +\frac{\lambda}{2m}\sum\limits_{j=1}^n\theta_j^2$
SVM:

$m i n θ C \sum i = 1 m [y (i) c o s t 1 (θ T x (i)) + (1 - y (i)) c o s t 0 (θ T x (i))] + 1 2 \sum j = 1 n θ 2 j$ $min_\theta C \sum\limits_{i=1}^m\bigg[y^{(i)}cost_1(\theta^Tx^{(i)})+ (1-y^{(i)})cost_0(\theta^Tx^{(i)})\bigg]+\frac{1}{2}\sum\limits_{j=1}^n\theta_j^2$
SVM hypothesis：

$h θ (x) = {1 i f θ T x \geq 0 0 o t h e r w i s e$ $h_\theta(x)=\begin{cases} 1 \quad if\; \theta^Tx\geq0 \\ 0 \quad otherwise \end{cases}$

2 - Large Margin Intuition

首先来看我们之前定义的优化目标：

可以看到，我们的cost function与logistic regression相比已经发生了明显的改变：

要预测 y=1，我们要求 $\theta^Tx\geq1$ ，而不再是 $\theta^Tx\geq0$
要预测 y=0，我们要求 $\theta^Tx\leq-1$ ，而不再是 $\theta^Tx<0$

那么 SVM 的 Decision Boundary 必然与 logistic regression 不同。SVM的 Decision Boundary 主要特点是 Large Margin，这也是 SVM 被称为 Large margin classifier 的原因。用图来表述：
margin

特殊的cost function 让 SVM 会选择一个 margin 最大的 Decision Boundary 。

3 - Kernels 核函数

3.1 what is Kernels

我们来看如下的分类问题：
classification

在选择features的时候，我们在logistic regression时选取的是 $x_1,x_2,x_1x_2,x_1^2,x_2^2$ ，并用 $\theta^Tx$ 作为 $h_\theta$ 的判断条件。

在 SVM 中，我们采取了另外的选择 feature 的方法，称为核函数。

Kernels and Similarity：
kernel

我们选取n个点，作为 landmarks(地标),用 $similarity(x,l^{(i)})$ 表示样本 x 与 $l^{(i)}$ 的相似程度，将它作为一个 feature ,用 $f_i$ 来表示，这就是核函数。
$f i = s i m i l a r i t y (x, l (i)) = e x p (- ∥ x - l ( i ) ∥ 2 2 σ 2) (G u a s s i a n K e r n e l)$ $f_i = similarity(x,l^{(i)})=exp(- \frac{\|x-l^{(i)}\|^2}{2\sigma^2}) \quad (Guassian \; Kernel)$
Hypothesis：
- Predict “1” when $\theta_0+\theta_1f_1+\theta_2f_2+\theta_3f_3 \geq 0$
关于 Guassian Kernel：
- 若 $\; x\approx l^{(i)}$ , 那么 $f_i=exp(-\dfrac{0^2}{2\sigma^2})\approx 1$
- 若 x 与 $l^{(i)}$ 相差很大,那么 $f_i = exp(-\dfrac{(large \; number)^2}{2\sigma^2})\approx 0$
- 关于 $\sigma^2$ 的取值问题：

3.2 Kernel的应用

$Given \; (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\dots,(x^{(m)},y^{(m)}),$

$Choose\; l^{(1)}=x^{(1)},l^{(2)}=x^{(2)},\dots,l^{(m)}=x^{(m)}.$
$Given \; training \; example \; (x^{(i)},y^{(i)}):$
$f (i) 1 = s i m i l a r i t y (x (i), l (1)) f (i) 2 = s i m i l a r i t y (x (i), l (2)) ⋮ f (i) m = s i m i l a r i t y (x (i), l (m))$ $f_1^{(i)}=similarity(x^{(i)},l^{(1)}) \\ f_2^{(i)}=similarity(x^{(i)},l^{(2)}) \\ \vdots \\ f_m^{(i)}=similarity(x^{(i)},l^{(m)})$
值得注意的是，其中 $f_i^{(i)}=similarity((x^{(i)},l^{(i)})=1$
Hypothesis：
$Predict \; y=1\; if \; \theta^Tf\geq0$ ,其中 $f \in R^{m+1} ,f= \begin{bmatrix} f_0\\f_1\\ \vdots \\ f_m \end{bmatrix}$
Optimization Objective：

$m i n θ C \sum i = 1 m [y (i) c o s t 1 (θ T f (i)) + (1 - y (i)) c o s t 0 (θ T f (i))] + 1 2 \sum j = 1 m θ 2 j$ $min_\theta C \sum\limits_{i=1}^m\bigg[y^{(i)}cost_1(\theta^Tf^{(i)})+ (1-y^{(i)})cost_0(\theta^Tf^{(i)})\bigg]+\frac{1}{2}\sum\limits_{j=1}^m\theta_j^2$
关于 SVM 参数的选择：
- C的选择
  - Large C：Lower bias，high variance.
  - Small C：Higher bias，low variance.
- $\sigma^2$ 的选择
- Large $\sigma^2$ ：属性 $f_i$ 的曲线比较平缓，Higher bias，low variance.
- Small $\sigma^2$ ：属性 $f_i$ 的曲线比较陡峭，Lower bias，high variance.

4 - Using An SVM

4.1 基本用法

使用软件包（如liblinear，libsvm，…）来求解参数 θ 。
需要指定的参数：

参数 C 的选择
Kernel的选择
- No kernel(“linear kernel”)：Predict “y=1” if $\theta^Tx\geq0$ ，选择条件：n 很大，m较小
- Guassian Kernel：需要选择 $\sigma^2$ ，选择条件：n较小，m很大
- 其他：polynomial kernel，String kernel，chi-square kernel，histogram intersection kernel

特别提醒： 在处理数据时不要忘了 feature scaling

4.2 Multi-class classification：

multi-class

4.3 Logistic regression 与 SVM 相对比

如果 n 很大，m 较小：
用 logistic regression 或者 SVM without kernel(“linear kernel”)
如果 n 较小，m中等大小：
用 SVM with Guassian kernel
如果 n 较小，m 很大：
先添加一些features，再使用logistic regression 或者 SVM without kernel
Neural network 在以上情况下都可以用，但是可能训练起来比较慢

DawnRanger

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
10-SVM

1 - Optimization Objective 优化目标1.1 从logistic regression来看首先来看Logistic regression的hypothesis函数：hθ(x)=11+e−θTx h_\theta(x)=\dfrac{1}{1+e^{-\theta^Tx}} ,可以知道：当y=1时，我们希望 hθ(x)≈1 h_\theta(x)\approx1 ,那么此时必
复制链接

扫一扫

专栏目录