题目6.1
试证明样本空间中任意点 x \bm{x} x到超平面 ( w , b ) (\bm{w}, b) (w,b)的距离为式(6.2)
假设点
x
\bm{x}
x在超平面上的投影为
x
′
\bm{x}'
x′,则
w
T
x
′
+
b
=
0
\bm{w}^{\text{T}} \bm{x}' + b = 0
wTx′+b=0。超平面的法向量为
w
\bm{w}
w,其与
x
−
x
′
\bm{x} - \bm{x}'
x−x′平行,有
∣
w
T
(
x
−
x
′
)
∣
=
∥
w
∥
⋅
∥
x
−
x
′
∥
=
∥
w
∥
⋅
r
,
|\bm{w}^{\text{T}} (\bm{x} - \bm{x}')| = \| \bm{w} \| \cdot \| \bm{x} - \bm{x}' \| = \| \bm{w} \| \cdot r,
∣wT(x−x′)∣=∥w∥⋅∥x−x′∥=∥w∥⋅r,
而
∣
w
T
(
x
−
x
′
)
∣
=
∣
w
T
x
+
b
−
w
T
x
′
−
b
∣
=
∣
w
T
x
+
b
∣
,
| \bm{w}^{\text{T}} (\bm{x} - \bm{x}') | = |\bm{w}^{\text{T}} \bm{x} + b - \bm{w}^{\text{T}} \bm{x}' - b| = |\bm{w}^{\text{T}} \bm{x} + b|,
∣wT(x−x′)∣=∣wTx+b−wTx′−b∣=∣wTx+b∣,
进而有
r
=
∣
w
T
x
+
b
∣
∥
w
∥
。
r = \frac{|\bm{w}^{\text{T}} \bm{x} + b|}{\| \bm{w} \|}。
r=∥w∥∣wTx+b∣。
题目6.2
试使用LIBSVM,在西瓜数据集 3.0 α 3.0 \alpha 3.0α上分别用线性核和高斯核训练一个SVM,并比较其支持向量的差别。
libsvm for python的安装请参考:链接
libsvm for python的使用请参考:链接
本题代码实现如下:
data_x = [{1:0.697, 2:0.46}, {1:0.774, 2:0.376}, {1:0.634, 2:0.264}, {1:0.608, 2:0.318},
{1:0.556, 2:0.215}, {1:0.403, 2:0.237}, {1:0.481, 2:0.149}, {1:0.437, 2:0.211},
{1:0.666, 2:0.091}, {1:0.243, 2:0.267}, {1:0.245, 2:0.057}, {1:0.343, 2:0.099},
{1:0.639, 2:0.161}, {1:0.657, 2:0.198}, {1:0.36, 2:0.37}, {1:0.593, 2:0.042},
{1:0.719, 2:0.103},]
data_y = [1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1]
import svm
import svmutil
print('线性核:')
# 线性核
c_acc = 0
for c_param in range(1, 10000, 100):
prob = svm.svm_problem(data_y, data_x, isKernel=True)
param = svm.svm_parameter('-t 0 -c %d -q' % c_param)
model = svmutil.svm_train(prob, param, '-q')
p_label, p_acc, p_val = svmutil.svm_predict(data_y, data_x, model, '-q')
if p_acc[0] >= 70 and p_acc[0] > c_acc:
c_acc = p_acc[0]
print(c_acc)
print('\n高斯核:')
# 高斯核
c_acc = 0
for c_param in range(1, 10000, 100):
prob = svm.svm_problem(data_y, data_x, isKernel=True)
param = svm.svm_parameter('-t 2 -c %d -q' % c_param)
model = svmutil.svm_train(prob, param, '-q')
p_label, p_acc, p_val = svmutil.svm_predict(data_y, data_x, model, '-q')
if p_acc[0] >= 70 and p_acc[0] > c_acc:
c_acc = p_acc[0]
print(c_acc)
print('\n多项式核:')
c_acc = 0
for c_param in range(1, 10000, 100):
prob = svm.svm_problem(data_y, data_x, isKernel=True)
param = svm.svm_parameter('-t 1 -d 2 -c %d -q' % c_param)
model = svmutil.svm_train(prob, param, '-q')
p_label, p_acc, p_val = svmutil.svm_predict(data_y, data_x, model, '-q')
if p_acc[0] >= 70 and p_acc[0] > c_acc:
c_acc = p_acc[0]
print(c_acc)
本题的可视化结果可参考链接,通过实践可进一步感受调参的过程,一般,libsvm选好核函数后需要调整的参数有-c和-g,-c的作用请参考:
这个C值是用来调软间隔的,C值越大说明数据本身越重要,软间隔小,不容许分类差错。C值无穷大意味着完全不允许软间隔,就是变成硬间隔的SVM。
用高斯核函数,当我们把C值逐渐调大的时候,我们发现可以完整地划分这个数据集,不产生任何差错。
作者:qdbszsj
来源:CSDN
原文:https://blog.csdn.net/qdbszsj/article/details/79124276
版权声明:本文为博主原创文章,转载请附上博文链接!
使用神经网络,隐含层设置3个神经元,经过训练后训练误差跟高斯核SVM一样可达0。相比简单的感知机,SVM和神经网络还是有巨大优势。
线性核
κ
(
x
,
x
′
)
=
x
T
x
′
=
x
1
x
1
′
+
x
2
x
2
′
+
⋯
+
x
n
x
n
′
=
ϕ
(
x
)
T
ϕ
(
x
′
)
\kappa(\bm{x}, \bm{x}') = \bm{x}^{\text{T}}\bm{x}' = x_1x'_1 + x_2x'_2 + \cdots +x_nx'_n = \phi(\bm{x})^{\text{T}}\phi(\bm{x}')
κ(x,x′)=xTx′=x1x1′+x2x2′+⋯+xnxn′=ϕ(x)Tϕ(x′),可见
ϕ
(
x
)
=
x
\phi(\bm{x})=\bm{x}
ϕ(x)=x。类似地,可推导高斯核可以映射至无穷维,
κ
(
x
i
,
x
j
)
=
e
x
p
(
−
(
x
i
(
1
)
−
x
j
(
1
)
)
2
+
⋯
+
(
x
i
(
m
)
−
x
j
(
m
)
)
2
2
σ
2
)
=
e
x
p
(
−
∥
x
i
∥
2
2
σ
2
)
⋅
e
x
p
(
−
∥
x
j
∥
2
2
σ
2
)
⋅
e
x
p
(
x
i
T
x
j
σ
2
)
,
\begin{aligned} \kappa(\bm{x}_i, \bm{x}_j) &= \mathsf{exp}\Big({-\frac{(\bm{x}^{(1)}_i - \bm{x}^{(1)}_j)^2 + \cdots + (\bm{x}^{(m)}_i - \bm{x}^{(m)}_j)^2}{2 \sigma^2}}\Big) \\ &= \mathsf{exp} \big( -\frac{\| \bm{x}_i \|^2}{2\sigma^2} \big) \cdot \mathsf{exp} \big( -\frac{\| \bm{x}_j \|^2}{2\sigma^2} \big) \cdot \mathsf{exp} \big( \frac{\bm{x}_i^{\text{T}} \bm{x}_j}{\sigma^2} \big), \end{aligned}
κ(xi,xj)=exp(−2σ2(xi(1)−xj(1))2+⋯+(xi(m)−xj(m))2)=exp(−2σ2∥xi∥2)⋅exp(−2σ2∥xj∥2)⋅exp(σ2xiTxj),
而
e
x
p
(
x
i
T
x
j
σ
2
)
=
∑
n
=
0
∞
(
x
i
T
x
j
)
n
n
!
=
∑
n
=
0
∞
∑
n
1
+
⋯
+
n
m
=
n
(
n
n
1
,
⋯
 
,
n
m
)
(
x
i
(
1
)
x
j
(
1
)
)
n
1
⋯
(
x
i
(
m
)
x
j
(
m
)
)
n
m
n
!
,
(
多
项
式
定
理
)
\begin{aligned} \mathsf{exp} \big( \frac{\bm{x}_i^{\text{T}} \bm{x}_j}{\sigma^2} \big) &= \sum_{n=0}^\infty \frac{ (\bm{x}_i^{\text{T}} \bm{x}_j)^n }{n!} \\ &= \sum_{n=0}^\infty \sum_{n_1 + \cdots + n_m = n} \Big( \frac{n}{n_1, \cdots, n_m} \Big) \frac{(\bm{x}_i^{(1)}\bm{x}_j^{(1)})^{n_1} \cdots (\bm{x}_i^{(m)}\bm{x}_j^{(m)})^{n_m}}{n!}, ~~(多项式定理) \end{aligned}
exp(σ2xiTxj)=n=0∑∞n!(xiTxj)n=n=0∑∞n1+⋯+nm=n∑(n1,⋯,nmn)n!(xi(1)xj(1))n1⋯(xi(m)xj(m))nm, (多项式定理)
有
ϕ
(
x
)
=
{
e
x
p
(
−
∥
x
∥
2
2
σ
2
)
∑
n
1
+
⋯
+
n
m
=
n
(
(
n
n
1
,
⋯
 
,
n
m
)
1
n
!
)
x
(
1
)
n
1
⋯
x
(
m
)
n
m
}
n
=
0
,
⋯
 
,
∞
,
\phi(\bm{x}) = \Big\{ \mathsf{exp} \big( -\frac{\| \bm{x} \|^2}{2\sigma^2} \big) \sum_{n_1 + \cdots + n_m = n} \Big( \sqrt{(\frac{n}{n_1, \cdots, n_m})} \sqrt{\frac{1}{n!}} \Big) {\bm{x}^{(1)}}^{n_1} \cdots {\bm{x}^{(m)}}^{n_m} \Big\}_{n=0, \cdots, \infty},
ϕ(x)={exp(−2σ2∥x∥2)n1+⋯+nm=n∑((n1,⋯,nmn)n!1)x(1)n1⋯x(m)nm}n=0,⋯,∞,
推导过程可见,高斯核其实隐含地包含了无穷多个多项式核。
题目6.4
试讨论线性判别分析(LDA)与线性支持向量机在何种条件下等价。
当线性支持向量机超平面
w
\bm{w}
w与LDA投影方向
w
′
\bm{w}'
w′垂直时两者等价,例如下图,蓝色线为LDA投影方向,黑色线为线性SVM分类超平面。
题目6.6
试析SVM对噪声敏感的原因。
训练完成后,大部分的训练样本都不需要保留,最终模型仅与支持向量机有关。当支持向量中出现噪声时,其会严重影响最终模型,所以SVM对噪声敏感。
题目6.9
试使用核技巧推广对率回归,产生“核对率回归”。
采用核方法,有
l
n
y
1
−
y
=
w
T
ϕ
(
x
)
+
b
=
β
T
ϕ
(
x
^
)
,
ln \frac{y}{1-y}=\bm{w}^{\text{T}} \phi(\bm{x}) + b=\bm{\beta}^{\text{T}} \phi(\bm{\hat{x}}),
ln1−yy=wTϕ(x)+b=βTϕ(x^),
其中
β
=
(
w
;
b
)
\bm{\beta} = (\bm{w}; b)
β=(w;b),
x
^
=
(
x
;
1
)
\bm{\hat{x}} = (\bm{x}; 1)
x^=(x;1),而损失函数为
ℓ
(
β
)
=
∑
i
=
1
m
(
−
y
i
β
T
ϕ
(
x
^
)
+
l
n
(
1
+
e
β
T
ϕ
(
x
^
)
)
)
。
\ell(\bm{\beta}) = \sum^m_{i=1} \Big( -y_i \bm{\beta}^{\text{T}}\phi(\bm{\hat{x}})+ln (1+e^{\bm{\beta}^{\text{T}}\phi(\bm{\hat{x}})}) \Big)。
ℓ(β)=i=1∑m(−yiβTϕ(x^)+ln(1+eβTϕ(x^)))。
由定理6.1(表示定理),有
l
n
y
1
−
y
=
h
(
x
^
)
=
∑
i
=
1
m
α
i
κ
(
x
^
,
x
^
i
)
,
ln \frac{y}{1-y} = h(\bm{\hat{x}})=\sum^m_{i=1} \alpha_i \kappa(\bm{\hat{x}}, \bm{\hat{x}}_i),
ln1−yy=h(x^)=i=1∑mαiκ(x^,x^i),
损失函数变成
ℓ
(
α
)
=
∑
i
=
1
m
(
−
y
i
∑
j
=
1
m
α
j
κ
(
x
^
i
,
x
^
j
)
)
+
∑
i
=
1
m
l
n
(
1
+
e
∑
j
=
1
m
α
j
κ
(
x
^
i
,
x
^
j
)
)
,
\ell(\alpha)=\sum^m_{i=1} \Big( -y_i \sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j) \Big) + \sum^m_{i=1} ln \big( 1 + e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)} \big),
ℓ(α)=i=1∑m(−yij=1∑mαjκ(x^i,x^j))+i=1∑mln(1+e∑j=1mαjκ(x^i,x^j)),
更新参数时使用批量梯度下降法,有
∂
ℓ
∂
α
j
=
−
∑
i
=
1
m
y
i
κ
(
x
^
i
,
x
^
j
)
+
∑
i
=
1
m
κ
(
x
^
i
,
x
^
j
)
1
+
e
∑
j
=
1
m
α
j
κ
(
x
^
i
,
x
^
j
)
⋅
e
∑
j
=
1
m
α
j
κ
(
x
^
i
,
x
^
j
)
,
\frac{\partial \ell}{\partial \alpha_j} = -\sum^m_{i=1} y_i \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j) + \sum^m_{i=1} \frac{\kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}{1+e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}} \cdot e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)},
∂αj∂ℓ=−i=1∑myiκ(x^i,x^j)+i=1∑m1+e∑j=1mαjκ(x^i,x^j)κ(x^i,x^j)⋅e∑j=1mαjκ(x^i,x^j),
结果实验效果很不理想(产生了严重的震荡)。
这里修改损失函数
ℓ
\ell
ℓ,使用最大化单个似然项(推荐看李航的《统计机器学习》)
p
(
y
i
∣
x
i
;
w
,
b
)
=
p
1
(
x
^
i
;
β
)
y
i
⋅
p
0
(
x
^
i
;
β
)
1
−
y
i
,
p(y_i | \bm{x}_i; \bm{w}, b) = p_1(\hat{\bm{x}}_i ; \bm{\beta})^{y_i} \cdot p_0(\hat{\bm{x}}_i ; \bm{\beta})^{1-y_i},
p(yi∣xi;w,b)=p1(x^i;β)yi⋅p0(x^i;β)1−yi,
相应地可得到
∂
ℓ
∂
α
j
=
−
y
i
κ
(
x
^
i
,
x
^
j
)
+
κ
(
x
^
i
,
x
^
j
)
1
+
e
∑
j
=
1
m
α
j
κ
(
x
^
i
,
x
^
j
)
⋅
e
∑
j
=
1
m
α
j
κ
(
x
^
i
,
x
^
j
)
,
\frac{\partial \ell}{\partial \alpha_j} = -y_i \kappa(\hat{\bm{x}}_i, \hat{\bm{x}}_j) + \frac{\kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}{1+e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}} \cdot e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)},
∂αj∂ℓ=−yiκ(x^i,x^j)+1+e∑j=1mαjκ(x^i,x^j)κ(x^i,x^j)⋅e∑j=1mαjκ(x^i,x^j),
代码如下,可采用题目3.3的数据,与对数几率回归的效果进行对比,
import numpy as np
import math
data_x = [[0.697, 0.460, 1], [0.774, 0.376, 1], [0.634, 0.264, 1], [0.608, 0.318, 1], [0.556, 0.215, 1],
[0.403, 0.237, 1],
[0.481, 0.149, 1], [0.437, 0.211, 1],
[0.666, 0.091, 1], [0.243, 0.267, 1], [0.245, 0.057, 1], [0.343, 0.099, 1], [0.639, 0.161, 1],
[0.657, 0.198, 1],
[0.360, 0.370, 1], [0.593, 0.042, 1], [0.719, 0.103, 1]]
data_y = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
kappa = np.mat(np.zeros([len(data_x), len(data_x)]))
gamma = 1.0 / 0.3
x = np.mat(data_x).T
y = np.mat(data_y).T
for i in range(len(data_x)):
for j in range(len(data_x)):
tmp_vec = x[:, i] - x[:, j]
kappa[i, j] = tmp_vec.T * tmp_vec
kappa = -gamma * kappa
kappa = np.exp(kappa)
alpha = np.mat(np.zeros([len(data_x), 1]))
ell = 0
for i in range(len(data_x)):
ell = ell + -y[i, 0] * alpha.T * kappa[i, :].T + math.log(1 + math.exp(alpha.T * kappa[i, :].T))
steps = 150
learn_rate = 0.01
for vs in range(steps):
exit_flag = False
for i in range(len(data_x)):
tmp_exp = (alpha.T * kappa[i, :].T)[0, 0]
first_da = -1.0 * np.mat(y.A * kappa[:, j].A)
second_da = kappa[:, j] / (1.0 + math.exp(tmp_exp)) * math.exp(tmp_exp)
alpha = alpha - learn_rate * (first_da + second_da)
last_ell = ell
ell = 0
for i in range(len(data_x)):
ell = ell + -y[i, 0] * alpha.T * kappa[i, :].T + math.log(1 + math.exp(alpha.T * kappa[i, :].T))
if ell > last_ell and math.fabs((ell - last_ell)[0, 0]) > 0.1:
exit_flag = True
break
if exit_flag:
break
correct_rate = 0
for j in range(len(data_x)):
y_ = 1.0 / (1.0 + math.exp(-(alpha.T * kappa[j, :].T)[0, 0]))
if y_ >= 0.5 and y[j] == 1 or y_ < 0.5 and y[j] == 0:
correct_rate = correct_rate + 1
print(1.0 * correct_rate / len(data_y))
梯度下降法超参数很难调,并且实验下来震荡问题依旧存在,算法较耗时,最终训练集准确率稳定在0.8235左右。
重新编写代码,用回书中介绍的牛顿法,解题步骤一模一样(以下代码使用了向量化技术,解题步骤看起来不明显,建议自己手动推导一遍),
import numpy as np
import math
data_x = [[0.697, 0.460, 1], [0.774, 0.376, 1], [0.634, 0.264, 1], [0.608, 0.318, 1], [0.556, 0.215, 1],
[0.403, 0.237, 1],
[0.481, 0.149, 1], [0.437, 0.211, 1],
[0.666, 0.091, 1], [0.243, 0.267, 1], [0.245, 0.057, 1], [0.343, 0.099, 1], [0.639, 0.161, 1],
[0.657, 0.198, 1],
[0.360, 0.370, 1], [0.593, 0.042, 1], [0.719, 0.103, 1]]
data_y = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
kappa = np.mat(np.zeros([len(data_x), len(data_x)]))
gamma = 1.0 / 0.3
x = np.mat(data_x).T
y = np.mat(data_y).T
for i in range(len(data_x)):
for j in range(len(data_x)):
tmp_vec = x[:, i] - x[:, j]
kappa[i, j] = tmp_vec.T * tmp_vec
kappa = -gamma * kappa
kappa = np.exp(kappa)
alpha = np.mat(np.zeros([len(data_x), 1]))
ell = 0
for i in range(len(data_x)):
ell = ell + -y[i, 0] * alpha.T * kappa[i, :].T + math.log(1 + math.exp(alpha.T * kappa[i, :].T))
steps = 50
for s in range(steps):
tmp_exp = np.exp(alpha.T * kappa.T)
p1 = (tmp_exp / (1 + tmp_exp)).T
second_da = -kappa * (y - p1)
first_da = np.zeros([len(data_x), len(data_x)])
for i in range(len(data_x)):
first_da = first_da + kappa[i, :].T * kappa[i, :] * p1[i, 0] * (1 - p1[i, 0])
u, sigmav, vt = np.linalg.svd(first_da)
sigma = np.zeros([len(sigmav), len(sigmav)])
for i in range(len(sigmav)):
sigma[i][i] = sigmav[i]
sigma = np.mat(sigma)
first_da_inv = vt.T * sigma.I * u.T
alpha = alpha - first_da_inv * second_da
correct_rate = 0
for j in range(len(data_x)):
y_ = 1.0 / (1.0 + math.exp(-(alpha.T * kappa[j, :].T)[0, 0]))
if y_ >= 0.5 and y[j] == 1 or y_ < 0.5 and y[j] == 0:
correct_rate = correct_rate + 1
if 1.0 * correct_rate / len(data_y) > 0.8:
break
correct_rate = 0
for j in range(len(data_x)):
y_ = 1.0 / (1.0 + math.exp(-(alpha.T * kappa[j, :].T)[0, 0]))
if y_ >= 0.5 and y[j] == 1 or y_ < 0.5 and y[j] == 0:
correct_rate = correct_rate + 1
print(1.0 * correct_rate / len(data_y))
算法收敛得很快,训练集准确率为1.0。牛顿法、梯度下降法和批量梯度下降法的使用,通过亲自实践,体悟会更多并且感受深刻。
Acknowledge
题目6.2参考自:
https://blog.csdn.net/qdbszsj/article/details/79124276
感谢@qdbszsj
https://zhuanlan.zhihu.com/p/49023182
感谢@我是韩小琦
https://www.cnblogs.com/lliuye/p/9622996.html
感谢@LLLiuye
https://blog.csdn.net/shengerjianku/article/details/54237376
感谢@shengerjianku
题目6.4参考自:
https://zhuanlan.zhihu.com/p/29016921
感谢@钢珠子
题目6.9参考自:
https://sine-x.com/machine-learning-2/#第6章-支持向量机
感谢@sin(x)
https://blog.csdn.net/icefire_tyh/article/details/52135526
感谢@四去六进一