吴恩达·Machine Learning || chap12 Support Vector Machines简记

最新推荐文章于 2024-08-14 19:43:55 发布

The Prestige

最新推荐文章于 2024-08-14 19:43:55 发布

阅读量83

点赞数

分类专栏： Machine Learning 文章标签：机器学习

本文链接：https://blog.csdn.net/qq_46203130/article/details/120142052

版权

Machine Learning 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

12-1 Optimization objective

Alternative view of logistic regression

$\theta } ( x ) = \frac { 1 } { 1 + e ^ { - \theta ^ { T } x } }$

If $y = 1$ ,we want $h_{\theta}(x)\approx1, \theta^Tx\gg0$

If $y = 0$ ,we want $h_{\theta}(x)\approx0, \theta^Tx\ll0$

Cost of example: $\log h _ { \theta } ( x ) + ( 1 - y ) \log ( 1 - h _ { \theta } ( x ) ) )$ $\log \frac { 1 } { 1 + e ^ { - \theta ^ { r } x } } - ( 1 - y ) \log ( 1 - \frac { 1 } { 1 + e ^ { - \theta ^ { r } x } } )$

Support vector machine

Logistic regression:
$min_\theta\frac { 1 } { m } [ \sum _ { i = 1 } ^ { m } y ^ { ( i ) }( -\log h _ { \theta } ( x ^ { ( i ) } ) + ( 1 - y ^ { ( i ) } ) (-\log ( 1 - h _ { \theta } ( x ^ { ( i ) } ) ) ] + \frac { \lambda } { 2 m } \sum _ { j = 1 } ^ { n } \theta^2_j$
support vector machine:
在这里插入图片描述
$\theta} C \sum _ { i = 1 } ^ { m }[ y ^ { ( i ) } {cost} _ { 1 } ( \theta ^ { T } x ^ { ( i ) } ) + ( 1 - y ^ { ( i ) } ) {cost} _ { 0 } ( \theta ^ { T } x ^ { ( i ) } ) ] + \frac { 1 } { 2 } \sum _ { i = 1 } ^ { n}\theta_j^2$

SVM hypothesis
$\theta} C \sum _ { i = 1 } ^ { m }[ y ^ { ( i ) } {cost} _ { 1 } ( \theta ^ { T } x ^ { ( i ) } ) + ( 1 - y ^ { ( i ) } ) {cost} _ { 0 } ( \theta ^ { T } x ^ { ( i ) } ) ] + \frac { 1 } { 2 } \sum _ { i = 1 } ^ { n}\theta_j^2$
Hypothesis:

$h_\theta(x)=\begin{cases}1\quad if\;\theta^Tx\ge0\\0\quad otherwise\end{cases}$

12-2 Large Margin Intuition

Support Vector Machine

if $y = 1$ ,we want $\theta^Tx\ge1\quad(\text{not just $\ge$ 0})$

在这里插入图片描述

if $y = 0$ ,we want $\theta^Tx\le-1\quad(\text{not just $<$ 0})$
在这里插入图片描述

SVM Decision Boundary

SVM Decision Boundary: Linearly separable case

Large margin classifier in presence of outliers

12-3 The mathematics behind large margin classification (optional)

Vector Inner Product 向量内积
在这里插入图片描述

SVM Decision Boundary

在这里插入图片描述

12-4 Kernels I

核函数

Non-linear Decision Boundary

predict y=1 if $\theta _ { 0 } + \theta _ { 1 } x _ { 1 } + \theta _ { 2 } x _ { 2 } + \theta _ { 3 } x _ { 1 } x _ { 2 } } { + \theta _ { 4 } x _ { 1 } ^ { 2 } + \theta _ { 5 } x _ { 2 } ^ { 2 } + \cdots \geq 0 }$

$h_\theta(x)=\begin{cases}1\quad if\;{ \theta _ { 0 } + \theta _ { 1 } x _ { 1 } + \theta _ { 2 } x _ { 2 } + \theta _ { 3 } x _ { 1 } x _ { 2 } } { + \cdots \geq 0 }\\0\quad otherwise\end{cases}$

Given x,compute new feature depending on proximity to landmarks $l^{(1)},l^{(2)},l^{(3)}$

Kernels and Similarity

$f_1=similarity(x,l^{(1)})=exp(-\frac{||x-l^{(1)}||^2}{2\sigma^2})=\operatorname { exp } ( - \frac { \sum _ { j = 1 } ^ { n } ( x _ { j } - l _j^{ ( 1 ) } ) ^ { 2 } } { 2 \sigma ^ { 2 } } )$

$\; x\approx l^{(1)}:$ $f_1\approx 1$

$If \;x\; if \; far \; from\;l^{(1)}:$ $f_1\approx 0$

在这里插入图片描述

12-5 Kernels II

Choosing the landmarks

SVM with Kernels

Given $\cdots , ( x ^ { ( m ) } , y ^ { ( m ) } )$

choose $\cdots , l ^ { ( m ) } = x ^ { ( m ) }$

Given example x:

$\begin{array}{l}f_1=similarity(x,l^{(1)})\\f_2=similarity(x,l^{(2)})\\\cdots\end{array}$

For training example $(x^{(i)},y^{(i)})\longrightarrow f^{(i)}$

Hypothesis: Given x,compute features $f\in \mathbb{R}^{m+1}$

Predict “y=1” if $\theta^T f\ge 0$

Training: $\theta} C \sum _ { i = 1 } ^ { m }[ y ^ { ( i ) } {cost} _ { 1 } ( \theta ^ { T } f ^ { ( i ) } ) + ( 1 - y ^ { ( i ) } ) {cost} _ { 0 } ( \theta ^ { T } f ^ { ( i ) } ) ] + \frac { 1 } { 2 } \sum _ { i = 1 } ^ { n}\theta_j^2$

SVM parameters:

$C(=\frac{1}{\lambda})$ .

Large C: Lower bias, high variance.(small $\lambda$ )
Small C: Higher bias, low variance.( $large\;\lambda$ )

$\sigma^2$

Large $\sigma^2$ : Features $f_i$ ; vary more smoothly. Higher bias, lower variance.

Small $\sigma^2$ : Features $f_i$ ; vary less smoothly. Lower bias, Higher variance.

12-6 Using an SVM

Use SVM software package (e.g. liblinear, libsvm, …) to solve for parameters $\theta$

Need to specify:

Choice of parameter C

Choice of kernel (similarity function):

E.g. No kernel (“linear kernel”)

$\theta _ { 0 } + \theta _ { 1 } x _ { 1 } + \theta _ { 2 } x _ { 2 } + \theta _ { 3 } + \cdots \ge 0 \longrightarrow n\;large,m\;small$

predict:“y=1” if $\theta^Tx\ge0$

Gaussian kernel:

$f_i=exp(-\frac{||x-l^{(i)}||^2}{2\sigma^2}),where\;l^{(i)}=x^{(i)}$

Need to choose $\sigma^2$

Kernel (similarity) functions:

function f= kernel(x1,x2)
	f=exp((-abs(x1-x2)^2)/(2*(sigma^2)))
return

Note: Do perform feature scaling before using the Gaussian kernel

=[](#4-3 Gradient descent in practice I: Feature Scaling)

Other choices of kernel

Note: Not all similarity functions similarity(x, l) make valid kernels (Need to satisfy technical condition called"mercer’s Theorem"to make sure SVM packages’ optimizations run correctly, and do not diverge)

Many of-the-shelf kernels avaliable:

Polynomial kernel: $k(x,l)=(x^Tl)^2,(x^Tl+1)^3,(x^Tl+5)^4$
More esoteric: String kernel, chi-square kernel, histogram intersection kernel

Multi-class classification
在这里插入图片描述

Logistic regression vs. SVMS

m=number of features( $x\in \mathbb{R}^{n+1}$ ), m =number of training examples
If n is large(relative to m)
Use logistic regression, or SVM without a kernel (“linear kernel”)

If n is small, m is intermediate:
Use SVM with Gaussian kernel

If m is small, m is large:
Create/add more features, then use logistic regression or SVM without a kernel

Neural network likely to work well for most of these settings, but may be slower to train

The Prestige

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
吴恩达·Machine Learning || chap12 Support Vector Machines简记

12-1 Optimization objectiveAlternative view of logistic regressionhθ(x)=11+e−θTxh _ { \theta } ( x ) = \frac { 1 } { 1 + e ^ { - \theta ^ { T } x } }hθ(x)=1+e−θTx1If y=1y=1y=1,we want hθ(x)≈1,θTx≫0h_{\theta}(x)\approx1, \theta^Tx\gg0hθ(x)≈1,θTx≫0If
复制链接

扫一扫

专栏目录