SVM_CS229PartV_学习笔记

最新推荐文章于 2023-09-10 11:33:16 发布

诶哟喂小吉啊

最新推荐文章于 2023-09-10 11:33:16 发布

阅读量187

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/weixin_40406018/article/details/99713560

版权

机器学习专栏收录该内容

0 篇文章 0 订阅

订阅专栏

margins and the idea of separating data with a large “gap”
optimal margin classifier
kernels, which give a way to apply SVMs in very high dimensional feature spaces
SMO algorithm, an implementation of SVMs

Notation

Use y ∈ {−1, 1} (instead of {0, 1}) to denote the class labels.
Use parameters $w, b$ , and write classifier as $h_{w,b}(x) = g(w^Tx+b)$ , where $g (z) = 1$ if $z > = 0$ , and $g (z) = - 1$ otherwise.
Drop the convention of letting $x_0 = 1$ be an extra coordinate in the input feature vector. Thus, $b$ takes $\theta_0$ 's role, and $w$ is $[\theta_1 \cdots \theta_n]^T$ .

Functional and geometric margins

Given a training example $x^{(i)}, y^{(i)})$ , we define the functional margin of $(w, b)$ with respect to the training example as $\gamma^{(i)} = y^{(i)}(w^Tx+b)$ .
A large functional margin means a confident and correct prediction.
Given a training example $x^{(i)}, y^{(i)})$ , we define the geometric margin of $(w, b)$ with respect to the training example as $\gamma^{(i)} = y^{(i)}\Large(\large({w}/{||w||})^T x^{(i)} + {b}/{||w||}\Large)$ .
The geometric margin is invariant to rescaling of the parameters; i.e., if we replace w with 2w and b with 2b, then the geometric margin does not change. Thus, when trying to fit w and b to training data, we can impose an arbitrary scaling constraint on w without changing anything important.
Finally, given a training set $S = {(x^{(i)}, y^{(i)}); i = 1, . . . , m}$ , we define the geometric margin of $(w, b)$ with respect to S to be the smallest of the geometric margins on the individual training examples: $\gamma = min \gamma^{(i)}$ .

Optimal margin classifier

Goal: find a decision boundary that maximizes the geometric margin, since this would reflect a very confident set of predictions on the training set and a good “fit” to the training data.

Lagrange duality

This will lead us to optimization problem’s dual form, which solves constrained optimization problems.

Kernels

A function to present the similarity between two examples.
Gaussian Kernel: $exp\large(-{{||x-z||}^2}/{2\sigma^2})$
If $K$ is a valid kernel (also called a Mercer kernel)(i.e., if it corresponds to some feature mapping $\phi$ ), then the corresponding Kernel matrix K is symmetric positive semidefinite.
Theorem (Mercer): Let K : $R^{n}xR^n$ → R be given. Then for K to be a valid (Mercer) kernel, it is necessary and sufficient that for any {x(1),…,x(m)}, (m < ∞), the corresponding kernel matrix is symmetric positive semi-definite.
If you have any learning algorithm that you can write in terms of only inner products ⟨x, z⟩ between input attribute vectors, then by replacing this with K(x, z) where K is a kernel, you can “magically” allow your algorithm to work efficiently in the high dimensional feature space corresponding to K.

Regularization and non-separable case

To make the algorithm work for non-linearly separable datasets as well as be less sensitive to outliers, regularization should be used to reduce the overfitting.

SMO algorithm

Sequential Minimal Optimization, an efficient way of solving the dual problem arising from the derivation of the SVM.

诶哟喂小吉啊

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SVM_CS229PartV_学习笔记

margins and the idea of separating data with a large “gap”optimal margin classifierkernels, which give a way to apply SVMs in very high dimensional feature spacesSMO algorithm, an implementation o...
复制链接

扫一扫

专栏目录