《统计学习方法》系列（2）

最新推荐文章于 2022-06-03 14:52:11 发布

Pikachu5808

最新推荐文章于 2022-06-03 14:52:11 发布

阅读量298

点赞数 1

分类专栏：机器学习文章标签：感知机

本文链接：https://blog.csdn.net/u012294618/article/details/79437562

版权

机器学习专栏收录该内容

6 篇文章 1 订阅

订阅专栏

本篇对应全书第二章，讲的是感知机。感知机（perceptron）是二类分类的线性分类模型，对应于输入空间（特征空间）中将数据进行线性划分的分离超平面，属于判别模型。感知机1957年由Rosenblatt提出，是神经网络与支持向量机的基础。

1、理论讲解

1.1、感知机模型

假设输入空间（特征空间）是 $\mathcal{X}\subseteq{\mathbf{R}^n}$ ，输出空间是 $\mathcal{Y}=\{1,-1\}$ 。输入 $x \in \mathcal{X}$ 表示实例的特征向量，对应于输入空间（特征空间）的点；输出 $y \in \mathcal{Y}$ 表示实例的类别。由输入空间到输出空间的如下函数

f (x) = s i g n (w x + b)

$f(x)=sign(wx+b)$
称为感知机。其中，

w w $w$ 和

b

$b$ 为感知机模型参数，

w∈Rn w ∈ R n $w \in \mathbf{R}^n$ 叫作权值向量（weight vector），

b∈R b ∈ R $b \in \mathbf{R}$ 叫作偏置（bias），sign是符号函数，即

s i g n (x) = {+ 1, x \geq 0 - 1, x < 0

$sign(x)=\begin{cases} +1,x\ge0 \\ -1,x<0 \end{cases}$
感知机模型的假设空间是定义在特征空间中的所有线性分类模型（linear classification model）或线性分类器（linear model），即函数集合

{f|f(x)=wx+b} { f | f ( x ) = w x + b } $\{f|f(x)=wx+b\}$ 。
感知机有如下几何解释：线性方程

w x + b = 0

$wx+b=0$
对应于特征空间

Rn R n $\mathbf{R}^n$ 中的一个超平面S，其中w是超平面的法向量，b是超平面的截距。这个超平面将特征空间划分为两个部分，位于两部分的点（特征向量）分别被分为正、负两类。因此，超平面S称为分离超平面（seperating hyperplane），如图所示

1.2、感知机学习策略

为了确定感知机模型参数w,b，需要定义（经验）损失函数并将损失函数极小化。这里选择的损失函数是所有误分类点到超平面S的总距离，假设超平面S的误分类点集合为M，那么所有误分类点到超平面S的总距离为

- 1 | | w | | \sum x i \in M y i (w x i + b)

$-{\frac{1}{||w||}} \sum_{x_i \in M} y_i(wx_i+b)$
不考虑

1||w|| 1 | | w | | $\frac {1}{||w||}$ ，就得到感知机学习的损失函数

L (w, b) = \sum x i \in M y i (w x i + b)

$L(w,b) =\sum_{x_i \in M} y_i(wx_i+b)$
至于为什么不考虑

1||w|| 1 | | w | | $\frac {1}{||w||}$ ，作者没有给出进一步的解释，读者不妨阅读参考文章1和参考文章2。
感知机学习的策略是在假设空间中选取使损失函数最小的模型参数w,b，即感知机模型。

1.3、感知机学习算法

感知机学习算法是对以下最优化问题的算法

m i n w, b L (w, b) = \sum x i \in M y i (w x i + b)

$\underset {w,b}{min} \quad L(w,b) =\sum_{x_i \in M} y_i(wx_i+b)$
其中，M为误分类点的集合。
感知机学习算法是误分类驱动的，具体采用随机梯度下降法（stochastic gradient descent）。假设误分类点集合M是固定的，那么损失函数

L(w,b) L ( w , b ) $L(w,b)$ 的梯度由

\nabla w L (w, b) = - \sum x i \in M y i x i

$\nabla _w L(w,b)=-\sum_{x_i \in M} y_i x_i$

\nabla b L (w, b) = - \sum x i \in M y i

$\nabla _b L(w,b)=-\sum_{x_i \in M} y_i$
给出。

1.3.1、感知机学习算法的原始形式

w \leftarrow w + η y i x i

$w\leftarrow w+\eta y_i x_i$

b \leftarrow b + η y i

$b\leftarrow b+\eta y_i$
（4）转至（2），直至训练集中没有误分类点

这种学习算法直观上有如下解释：当一个实例点被误分类，即位于分离超平面的错误一侧时，则调整w,b的值，使分离超平面向该误分类点的一侧移动，以减少该误分类点与超平面的距离，直至超平面越过该分类点使其被正确分类。
感知机学习算法由于采用不同的初值或选取不同的误分类点，解可以不同。

1.3.2、感知机学习算法的对偶形式

输入：训练数据集 $T=\{(x_1,y_1),(x_2,y_2),\dots ,(x_N,y_N)\}$ ，其中 $x_i \in \mathcal{X}=\mathbf {R}^n$ ， $y_i \in \mathcal {Y}=\{-1,+1\}$ ， $i=1,2,\dots,N$ ；学习率 $\eta(0<\eta \le 1)$ ；
输出： $\alpha ,b$ ；感知机模型 $f(x)=sign(\begin{matrix} \sum_{j=1}^{N} \end {matrix}\alpha _j y_j x_j x+b)$ 。
其中， $\alpha =(\alpha _1,\alpha _2,\dots, \alpha _N)$ .
（1） $\alpha \leftarrow 0,b \leftarrow 0$
（2）在训练集中选取数据 $(x_i,y_i)$
（3）如果 $y_i(\begin{matrix} \sum_{j=1}^{N} \end {matrix}\alpha _j y_j x_j x_i+b) \le 0$

α i \leftarrow α i + η

$\alpha _i \leftarrow \alpha _i + \eta$

b \leftarrow b + η y i

$b \leftarrow b + \eta y_i$
（4）转至（2）直到没有误分类数据

对偶形式中训练实例仅以内积的形式出现，为了方便，可以预先将训练集中实例间的内积计算出来并以矩阵形式存储，这个矩阵就是所谓的Gram矩阵（Gram matrix）

G = [x i x j] N \times N

$G=[x_ix_j]_{N\times N}$

1.3.3、总结

当训练数据集线性可分时，感知机学习算法是收敛的；否则，感知机学习算法不收敛，迭代结果会发生震荡。

2、代码实现

2.1、手工实现

from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from random import shuffle
from sklearn.model_selection import train_test_split


class Perceptron:
        def __init__(self, X_train, y_train, lr = 0.1):
                self.X_train = X_train
                self.y_train = y_train
                self.w = np.zeros(self.X_train.shape[1])
                self.b = 0
                self.lr = lr

        def fit(self):
                iter_flag = True
                while iter_flag:
                        train_set = zip(self.X_train, self.y_train)
                        shuffle(train_set)
                        for X_i, y_i in train_set:
                                if y_i * (np.dot(self.w, X_i) + self.b) <= 0:
                                        self.w += self.lr * np.dot(X_i, y_i)
                                        self.b += self.lr * y_i
                                        break
                        else:
                                iter_flag = False

                print "Perceptron OK!"

        def predict(self, X_test):
                label_list = []
                for X in X_test:
                        label = np.sign(np.dot(self.w, X) + self.b)
                        label = label if label else -1
                        label_list.append(label)
                return np.array(label_list)

        def score(self, X_test, y_test):
                total_num = len(X_test)
                pre = (self.predict(X_test) == y_test).sum()
                score = pre/total_num
                return score


if __name__ == "__main__":
        iris = load_iris()
        X = iris.data[:100, :2]
        y = iris.target[:100]
        y[y == 0] = -1
        xlabel = iris.feature_names[0]
        ylabel = iris.feature_names[1]

        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

        X_0 = X_train[y_train == -1]
        X_1 = X_train[y_train == 1]

        plt.figure("perceptron-mine")
        plt.scatter(X_0[:, 0], X_0[:, 1], label = '-1')
        plt.scatter(X_1[:, 0], X_1[:, 1], label = '1')
        plt.xlabel(xlabel)
        plt.ylabel(ylabel)
        plt.legend()

        clf = Perceptron(X, y)
        clf.fit()
        score = clf.score(X_test, y_test)
        print "score : %s" % score

        y_pre = clf.predict(X_test)
        X_test_pre_0 = X_test[y_pre == -1]
        X_test_pre_1 = X_test[y_pre == 1]
        plt.scatter(X_test_pre_0[:, 0], X_test_pre_0[:, 1], color = 'r', label = 'pre -1')
        plt.scatter(X_test_pre_1[:, 0], X_test_pre_1[:, 1], color = 'k', label = 'pre 1')
        plt.legend()


        X_simul = np.arange(4, 8)
        y_simul = -(clf.w[0] * X_simul + clf.b)/clf.w[1]
        plt.plot(X_simul, y_simul, color = 'g', label = 'model')
        plt.legend()
        plt.show()

2.2、sklearn实现

from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split


if __name__ == "__main__":
        iris = load_iris()
        X = iris.data[:100, :2]
        y = iris.target[:100]
        y[y == 0] = -1
        xlabel = iris.feature_names[0]
        ylabel = iris.feature_names[1]

        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

        X_0 = X_train[y_train == -1]
        X_1 = X_train[y_train == 1]

        plt.figure("perceptron-sklearn")
        plt.scatter(X_0[:, 0], X_0[:, 1], label = '-1')
        plt.scatter(X_1[:, 0], X_1[:, 1], label = '1')
        plt.xlabel(xlabel)
        plt.ylabel(ylabel)
        plt.legend()

        clf = Perceptron(max_iter = 1000)
        clf.fit(X, y)
        score = clf.score(X_test, y_test)
        print "score : %s" % score

        y_pre = clf.predict(X_test)
        X_test_pre_0 = X_test[y_pre == -1]
        X_test_pre_1 = X_test[y_pre == 1]
        plt.scatter(X_test_pre_0[:, 0], X_test_pre_0[:, 1], color = 'r', label = 'pre -1')
        plt.scatter(X_test_pre_1[:, 0], X_test_pre_1[:, 1], color = 'k', label = 'pre 1')
        plt.legend()

        X_simul = np.arange(4, 8)
        y_simul = -(clf.coef_[0, 0] * X_simul + clf.intercept_)/clf.coef_[0, 1]
        plt.plot(X_simul, y_simul, color = 'g', label = 'model')
        plt.legend()
        plt.show()

代码已上传至github：https://github.com/xiongzwfire/statistical-learning-method

参考文献

[1] http://blog.csdn.net/lyg1112/article/details/52572405
[2] https://github.com/wzyonggege/statistical-learning-method
以上为本文的全部参考文献，对原作者表示感谢。

Pikachu5808

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
《统计学习方法》系列（2）

本篇对应全书第二章，讲的是感知机。感知机（perceptron）是二类分类的线性分类模型，对应于输入空间（特征空间）中将数据进行线性划分的分离超平面，属于判别模型。感知机1957年由Rosenblatt提出，是神经网络与支持向量机的基础。1、理论讲解1.1、感知机模型假设输入空间（特征空间）是⊆RnX⊆Rn，输出空间是={1,−1}Y={1,−1}。输入x∈x∈X表示...
复制链接

扫一扫