西瓜书《机器学习》第六章部分课后题

题目6.1

试证明样本空间中任意点 x \bm{x} x到超平面 ( w , b ) (\bm{w}, b) (w,b)的距离为式(6.2)

假设点 x \bm{x} x在超平面上的投影为 x ′ \bm{x}' x,则 w T x ′ + b = 0 \bm{w}^{\text{T}} \bm{x}' + b = 0 wTx+b=0。超平面的法向量为 w \bm{w} w,其与 x − x ′ \bm{x} - \bm{x}' xx平行,有
∣ w T ( x − x ′ ) ∣ = ∥ w ∥ ⋅ ∥ x − x ′ ∥ = ∥ w ∥ ⋅ r , |\bm{w}^{\text{T}} (\bm{x} - \bm{x}')| = \| \bm{w} \| \cdot \| \bm{x} - \bm{x}' \| = \| \bm{w} \| \cdot r, wT(xx)=wxx=wr

∣ w T ( x − x ′ ) ∣ = ∣ w T x + b − w T x ′ − b ∣ = ∣ w T x + b ∣ , | \bm{w}^{\text{T}} (\bm{x} - \bm{x}') | = |\bm{w}^{\text{T}} \bm{x} + b - \bm{w}^{\text{T}} \bm{x}' - b| = |\bm{w}^{\text{T}} \bm{x} + b|, wT(xx)=wTx+bwTxb=wTx+b
进而有 r = ∣ w T x + b ∣ ∥ w ∥ 。 r = \frac{|\bm{w}^{\text{T}} \bm{x} + b|}{\| \bm{w} \|}。 r=wwTx+b

题目6.2

试使用LIBSVM,在西瓜数据集 3.0 α 3.0 \alpha 3.0α上分别用线性核和高斯核训练一个SVM,并比较其支持向量的差别。

libsvm for python的安装请参考:链接
libsvm for python的使用请参考:链接

本题代码实现如下:

data_x = [{1:0.697, 2:0.46}, {1:0.774, 2:0.376}, {1:0.634, 2:0.264}, {1:0.608, 2:0.318},
          {1:0.556, 2:0.215}, {1:0.403, 2:0.237}, {1:0.481, 2:0.149}, {1:0.437, 2:0.211},
          {1:0.666, 2:0.091}, {1:0.243, 2:0.267}, {1:0.245, 2:0.057}, {1:0.343, 2:0.099},
          {1:0.639, 2:0.161}, {1:0.657, 2:0.198}, {1:0.36, 2:0.37}, {1:0.593, 2:0.042},
          {1:0.719, 2:0.103},]
data_y = [1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1]

import svm
import svmutil

print('线性核:')

# 线性核
c_acc = 0
for c_param in range(1, 10000, 100):
    prob = svm.svm_problem(data_y, data_x, isKernel=True)
    param = svm.svm_parameter('-t 0 -c %d -q' % c_param)
    model = svmutil.svm_train(prob, param, '-q')
    p_label, p_acc, p_val = svmutil.svm_predict(data_y, data_x, model, '-q')
    if p_acc[0] >= 70 and p_acc[0] > c_acc:
        c_acc = p_acc[0]
        print(c_acc)

print('\n高斯核:')

# 高斯核
c_acc = 0
for c_param in range(1, 10000, 100):
    prob = svm.svm_problem(data_y, data_x, isKernel=True)
    param = svm.svm_parameter('-t 2 -c %d -q' % c_param)
    model = svmutil.svm_train(prob, param, '-q')
    p_label, p_acc, p_val = svmutil.svm_predict(data_y, data_x, model, '-q')
    if p_acc[0] >= 70 and p_acc[0] > c_acc:
        c_acc = p_acc[0]
        print(c_acc)

print('\n多项式核:')

c_acc = 0
for c_param in range(1, 10000, 100):
    prob = svm.svm_problem(data_y, data_x, isKernel=True)
    param = svm.svm_parameter('-t 1 -d 2 -c %d -q' % c_param)
    model = svmutil.svm_train(prob, param, '-q')
    p_label, p_acc, p_val = svmutil.svm_predict(data_y, data_x, model, '-q')
    if p_acc[0] >= 70 and p_acc[0] > c_acc:
        c_acc = p_acc[0]
        print(c_acc)

本题的可视化结果可参考链接,通过实践可进一步感受调参的过程,一般,libsvm选好核函数后需要调整的参数有-c和-g,-c的作用请参考:

这个C值是用来调软间隔的,C值越大说明数据本身越重要,软间隔小,不容许分类差错。C值无穷大意味着完全不允许软间隔,就是变成硬间隔的SVM。
用高斯核函数,当我们把C值逐渐调大的时候,我们发现可以完整地划分这个数据集,不产生任何差错。
作者:qdbszsj
来源:CSDN
原文:https://blog.csdn.net/qdbszsj/article/details/79124276
版权声明:本文为博主原创文章,转载请附上博文链接!

使用神经网络,隐含层设置3个神经元,经过训练后训练误差跟高斯核SVM一样可达0。相比简单的感知机,SVM和神经网络还是有巨大优势。

线性核 κ ( x , x ′ ) = x T x ′ = x 1 x 1 ′ + x 2 x 2 ′ + ⋯ + x n x n ′ = ϕ ( x ) T ϕ ( x ′ ) \kappa(\bm{x}, \bm{x}') = \bm{x}^{\text{T}}\bm{x}' = x_1x'_1 + x_2x'_2 + \cdots +x_nx'_n = \phi(\bm{x})^{\text{T}}\phi(\bm{x}') κ(x,x)=xTx=x1x1+x2x2++xnxn=ϕ(x)Tϕ(x),可见 ϕ ( x ) = x \phi(\bm{x})=\bm{x} ϕ(x)=x。类似地,可推导高斯核可以映射至无穷维,
κ ( x i , x j ) = e x p ( − ( x i ( 1 ) − x j ( 1 ) ) 2 + ⋯ + ( x i ( m ) − x j ( m ) ) 2 2 σ 2 ) = e x p ( − ∥ x i ∥ 2 2 σ 2 ) ⋅ e x p ( − ∥ x j ∥ 2 2 σ 2 ) ⋅ e x p ( x i T x j σ 2 ) , \begin{aligned} \kappa(\bm{x}_i, \bm{x}_j) &= \mathsf{exp}\Big({-\frac{(\bm{x}^{(1)}_i - \bm{x}^{(1)}_j)^2 + \cdots + (\bm{x}^{(m)}_i - \bm{x}^{(m)}_j)^2}{2 \sigma^2}}\Big) \\ &= \mathsf{exp} \big( -\frac{\| \bm{x}_i \|^2}{2\sigma^2} \big) \cdot \mathsf{exp} \big( -\frac{\| \bm{x}_j \|^2}{2\sigma^2} \big) \cdot \mathsf{exp} \big( \frac{\bm{x}_i^{\text{T}} \bm{x}_j}{\sigma^2} \big), \end{aligned} κ(xi,xj)=exp(2σ2(xi(1)xj(1))2++(xi(m)xj(m))2)=exp(2σ2xi2)exp(2σ2xj2)exp(σ2xiTxj)

e x p ( x i T x j σ 2 ) = ∑ n = 0 ∞ ( x i T x j ) n n ! = ∑ n = 0 ∞ ∑ n 1 + ⋯ + n m = n ( n n 1 , ⋯   , n m ) ( x i ( 1 ) x j ( 1 ) ) n 1 ⋯ ( x i ( m ) x j ( m ) ) n m n ! ,    ( 多 项 式 定 理 ) \begin{aligned} \mathsf{exp} \big( \frac{\bm{x}_i^{\text{T}} \bm{x}_j}{\sigma^2} \big) &= \sum_{n=0}^\infty \frac{ (\bm{x}_i^{\text{T}} \bm{x}_j)^n }{n!} \\ &= \sum_{n=0}^\infty \sum_{n_1 + \cdots + n_m = n} \Big( \frac{n}{n_1, \cdots, n_m} \Big) \frac{(\bm{x}_i^{(1)}\bm{x}_j^{(1)})^{n_1} \cdots (\bm{x}_i^{(m)}\bm{x}_j^{(m)})^{n_m}}{n!}, ~~(多项式定理) \end{aligned} exp(σ2xiTxj)=n=0n!(xiTxj)n=n=0n1++nm=n(n1,,nmn)n!(xi(1)xj(1))n1(xi(m)xj(m))nm  

ϕ ( x ) = { e x p ( − ∥ x ∥ 2 2 σ 2 ) ∑ n 1 + ⋯ + n m = n ( ( n n 1 , ⋯   , n m ) 1 n ! ) x ( 1 ) n 1 ⋯ x ( m ) n m } n = 0 , ⋯   , ∞ , \phi(\bm{x}) = \Big\{ \mathsf{exp} \big( -\frac{\| \bm{x} \|^2}{2\sigma^2} \big) \sum_{n_1 + \cdots + n_m = n} \Big( \sqrt{(\frac{n}{n_1, \cdots, n_m})} \sqrt{\frac{1}{n!}} \Big) {\bm{x}^{(1)}}^{n_1} \cdots {\bm{x}^{(m)}}^{n_m} \Big\}_{n=0, \cdots, \infty}, ϕ(x)={exp(2σ2x2)n1++nm=n((n1,,nmn) n!1 )x(1)n1x(m)nm}n=0,,
推导过程可见,高斯核其实隐含地包含了无穷多个多项式核。

题目6.4

试讨论线性判别分析(LDA)与线性支持向量机在何种条件下等价。

当线性支持向量机超平面 w \bm{w} w与LDA投影方向 w ′ \bm{w}' w垂直时两者等价,例如下图,蓝色线为LDA投影方向,黑色线为线性SVM分类超平面。
在这里插入图片描述

题目6.6

试析SVM对噪声敏感的原因。

训练完成后,大部分的训练样本都不需要保留,最终模型仅与支持向量机有关。当支持向量中出现噪声时,其会严重影响最终模型,所以SVM对噪声敏感。

题目6.9

试使用核技巧推广对率回归,产生“核对率回归”。

采用核方法,有
l n y 1 − y = w T ϕ ( x ) + b = β T ϕ ( x ^ ) , ln \frac{y}{1-y}=\bm{w}^{\text{T}} \phi(\bm{x}) + b=\bm{\beta}^{\text{T}} \phi(\bm{\hat{x}}), ln1yy=wTϕ(x)+b=βTϕ(x^)
其中 β = ( w ; b ) \bm{\beta} = (\bm{w}; b) β=(w;b) x ^ = ( x ; 1 ) \bm{\hat{x}} = (\bm{x}; 1) x^=(x;1),而损失函数为
ℓ ( β ) = ∑ i = 1 m ( − y i β T ϕ ( x ^ ) + l n ( 1 + e β T ϕ ( x ^ ) ) ) 。 \ell(\bm{\beta}) = \sum^m_{i=1} \Big( -y_i \bm{\beta}^{\text{T}}\phi(\bm{\hat{x}})+ln (1+e^{\bm{\beta}^{\text{T}}\phi(\bm{\hat{x}})}) \Big)。 (β)=i=1m(yiβTϕ(x^)+ln(1+eβTϕ(x^)))
由定理6.1(表示定理),有
l n y 1 − y = h ( x ^ ) = ∑ i = 1 m α i κ ( x ^ , x ^ i ) , ln \frac{y}{1-y} = h(\bm{\hat{x}})=\sum^m_{i=1} \alpha_i \kappa(\bm{\hat{x}}, \bm{\hat{x}}_i), ln1yy=h(x^)=i=1mαiκ(x^,x^i)
损失函数变成
ℓ ( α ) = ∑ i = 1 m ( − y i ∑ j = 1 m α j κ ( x ^ i , x ^ j ) ) + ∑ i = 1 m l n ( 1 + e ∑ j = 1 m α j κ ( x ^ i , x ^ j ) ) , \ell(\alpha)=\sum^m_{i=1} \Big( -y_i \sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j) \Big) + \sum^m_{i=1} ln \big( 1 + e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)} \big), (α)=i=1m(yij=1mαjκ(x^i,x^j))+i=1mln(1+ej=1mαjκ(x^i,x^j))
更新参数时使用批量梯度下降法,有
∂ ℓ ∂ α j = − ∑ i = 1 m y i κ ( x ^ i , x ^ j ) + ∑ i = 1 m κ ( x ^ i , x ^ j ) 1 + e ∑ j = 1 m α j κ ( x ^ i , x ^ j ) ⋅ e ∑ j = 1 m α j κ ( x ^ i , x ^ j ) , \frac{\partial \ell}{\partial \alpha_j} = -\sum^m_{i=1} y_i \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j) + \sum^m_{i=1} \frac{\kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}{1+e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}} \cdot e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}, αj=i=1myiκ(x^i,x^j)+i=1m1+ej=1mαjκ(x^i,x^j)κ(x^i,x^j)ej=1mαjκ(x^i,x^j)
结果实验效果很不理想(产生了严重的震荡)。

这里修改损失函数 ℓ \ell ,使用最大化单个似然项(推荐看李航的《统计机器学习》)
p ( y i ∣ x i ; w , b ) = p 1 ( x ^ i ; β ) y i ⋅ p 0 ( x ^ i ; β ) 1 − y i , p(y_i | \bm{x}_i; \bm{w}, b) = p_1(\hat{\bm{x}}_i ; \bm{\beta})^{y_i} \cdot p_0(\hat{\bm{x}}_i ; \bm{\beta})^{1-y_i}, p(yixi;w,b)=p1(x^i;β)yip0(x^i;β)1yi
相应地可得到
∂ ℓ ∂ α j = − y i κ ( x ^ i , x ^ j ) + κ ( x ^ i , x ^ j ) 1 + e ∑ j = 1 m α j κ ( x ^ i , x ^ j ) ⋅ e ∑ j = 1 m α j κ ( x ^ i , x ^ j ) , \frac{\partial \ell}{\partial \alpha_j} = -y_i \kappa(\hat{\bm{x}}_i, \hat{\bm{x}}_j) + \frac{\kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}{1+e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}} \cdot e^{\sum^m_{j=1} \alpha_j \kappa(\bm{\hat{x}}_i, \bm{\hat{x}}_j)}, αj=yiκ(x^i,x^j)+1+ej=1mαjκ(x^i,x^j)κ(x^i,x^j)ej=1mαjκ(x^i,x^j)

代码如下,可采用题目3.3的数据,与对数几率回归的效果进行对比,

import numpy as np
import math

data_x = [[0.697, 0.460, 1], [0.774, 0.376, 1], [0.634, 0.264, 1], [0.608, 0.318, 1], [0.556, 0.215, 1],
          [0.403, 0.237, 1],
          [0.481, 0.149, 1], [0.437, 0.211, 1],
          [0.666, 0.091, 1], [0.243, 0.267, 1], [0.245, 0.057, 1], [0.343, 0.099, 1], [0.639, 0.161, 1],
          [0.657, 0.198, 1],
          [0.360, 0.370, 1], [0.593, 0.042, 1], [0.719, 0.103, 1]]
data_y = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

kappa = np.mat(np.zeros([len(data_x), len(data_x)]))
gamma = 1.0 / 0.3

x = np.mat(data_x).T
y = np.mat(data_y).T

for i in range(len(data_x)):
    for j in range(len(data_x)):
        tmp_vec = x[:, i] - x[:, j]
        kappa[i, j] = tmp_vec.T * tmp_vec
kappa = -gamma * kappa
kappa = np.exp(kappa)

alpha = np.mat(np.zeros([len(data_x), 1]))
ell = 0
for i in range(len(data_x)):
    ell = ell + -y[i, 0] * alpha.T * kappa[i, :].T + math.log(1 + math.exp(alpha.T * kappa[i, :].T))

steps = 150
learn_rate = 0.01

for vs in range(steps):
    exit_flag = False
    for i in range(len(data_x)):
        tmp_exp = (alpha.T * kappa[i, :].T)[0, 0]
        first_da = -1.0 * np.mat(y.A * kappa[:, j].A)
        second_da = kappa[:, j] / (1.0 + math.exp(tmp_exp)) * math.exp(tmp_exp)
        alpha = alpha - learn_rate * (first_da + second_da)
        last_ell = ell
        ell = 0
        for i in range(len(data_x)):
            ell = ell + -y[i, 0] * alpha.T * kappa[i, :].T + math.log(1 + math.exp(alpha.T * kappa[i, :].T))
        if ell > last_ell and math.fabs((ell - last_ell)[0, 0]) > 0.1:
            exit_flag = True
            break
    if exit_flag:
        break

correct_rate = 0
for j in range(len(data_x)):
    y_ = 1.0 / (1.0 + math.exp(-(alpha.T * kappa[j, :].T)[0, 0]))
    if y_ >= 0.5 and y[j] == 1 or y_ < 0.5 and y[j] == 0:
        correct_rate = correct_rate + 1
print(1.0 * correct_rate / len(data_y))

梯度下降法超参数很难调,并且实验下来震荡问题依旧存在,算法较耗时,最终训练集准确率稳定在0.8235左右。

重新编写代码,用回书中介绍的牛顿法,解题步骤一模一样(以下代码使用了向量化技术,解题步骤看起来不明显,建议自己手动推导一遍),

import numpy as np
import math

data_x = [[0.697, 0.460, 1], [0.774, 0.376, 1], [0.634, 0.264, 1], [0.608, 0.318, 1], [0.556, 0.215, 1],
          [0.403, 0.237, 1],
          [0.481, 0.149, 1], [0.437, 0.211, 1],
          [0.666, 0.091, 1], [0.243, 0.267, 1], [0.245, 0.057, 1], [0.343, 0.099, 1], [0.639, 0.161, 1],
          [0.657, 0.198, 1],
          [0.360, 0.370, 1], [0.593, 0.042, 1], [0.719, 0.103, 1]]
data_y = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

kappa = np.mat(np.zeros([len(data_x), len(data_x)]))
gamma = 1.0 / 0.3

x = np.mat(data_x).T
y = np.mat(data_y).T

for i in range(len(data_x)):
    for j in range(len(data_x)):
        tmp_vec = x[:, i] - x[:, j]
        kappa[i, j] = tmp_vec.T * tmp_vec
kappa = -gamma * kappa
kappa = np.exp(kappa)

alpha = np.mat(np.zeros([len(data_x), 1]))
ell = 0
for i in range(len(data_x)):
    ell = ell + -y[i, 0] * alpha.T * kappa[i, :].T + math.log(1 + math.exp(alpha.T * kappa[i, :].T))

steps = 50

for s in range(steps):
    tmp_exp = np.exp(alpha.T * kappa.T)
    p1 = (tmp_exp / (1 + tmp_exp)).T
    second_da = -kappa * (y - p1)
    first_da = np.zeros([len(data_x), len(data_x)])
    for i in range(len(data_x)):
        first_da = first_da + kappa[i, :].T * kappa[i, :] * p1[i, 0] * (1 - p1[i, 0])

    u, sigmav, vt = np.linalg.svd(first_da)
    sigma = np.zeros([len(sigmav), len(sigmav)])
    for i in range(len(sigmav)):
        sigma[i][i] = sigmav[i]
    sigma = np.mat(sigma)
    first_da_inv = vt.T * sigma.I * u.T

    alpha = alpha - first_da_inv * second_da
    correct_rate = 0
    for j in range(len(data_x)):
        y_ = 1.0 / (1.0 + math.exp(-(alpha.T * kappa[j, :].T)[0, 0]))
        if y_ >= 0.5 and y[j] == 1 or y_ < 0.5 and y[j] == 0:
            correct_rate = correct_rate + 1
    if 1.0 * correct_rate / len(data_y) > 0.8:
        break

correct_rate = 0
for j in range(len(data_x)):
    y_ = 1.0 / (1.0 + math.exp(-(alpha.T * kappa[j, :].T)[0, 0]))
    if y_ >= 0.5 and y[j] == 1 or y_ < 0.5 and y[j] == 0:
        correct_rate = correct_rate + 1
print(1.0 * correct_rate / len(data_y))

算法收敛得很快,训练集准确率为1.0。牛顿法、梯度下降法和批量梯度下降法的使用,通过亲自实践,体悟会更多并且感受深刻。

Acknowledge

题目6.2参考自:
https://blog.csdn.net/qdbszsj/article/details/79124276
感谢@qdbszsj
https://zhuanlan.zhihu.com/p/49023182
感谢@我是韩小琦
https://www.cnblogs.com/lliuye/p/9622996.html
感谢@LLLiuye
https://blog.csdn.net/shengerjianku/article/details/54237376
感谢@shengerjianku

题目6.4参考自:
https://zhuanlan.zhihu.com/p/29016921
感谢@钢珠子

题目6.9参考自:
https://sine-x.com/machine-learning-2/#第6章-支持向量机
感谢@sin(x)
https://blog.csdn.net/icefire_tyh/article/details/52135526
感谢@四去六进一

  • 19
    点赞
  • 108
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值