机器学习番外篇-------感知器学习法则_感知器学习规则-CSDN博客

本文链接：https://blog.csdn.net/liuhehe123/article/details/81571172

获取测试样例数据集以鸢尾花为例

import pandas as pd

import matplotlib.pyplot as plt

from matplotlib.colors import ListedColormap

import numpy as np

>>> source_addr='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'

>>> df = pd.read_csv(source_addr, header=None)

>>> df.tail() # 查看鸢尾花数据集是否正确加载

0 1 2 3 4

145 6.7 3.0 5.2 2.3 Iris-virginica

146 6.3 2.5 5.0 1.9 Iris-virginica

147 6.5 3.0 5.2 2.0 Iris-virginica

148 6.2 3.4 5.4 2.3 Iris-virginica

149 5.9 3.0 5.1 1.8 Iris-virginica

>>> y = df.iloc[0:100, 4].values # pandas.DataFrame.iloc 取前100个值且只去每行样本的第四列（Iris-virginica）

>>> y = np.where(y == 'Iris-setosa', -1, 1)

numpy的where(condition, T, F)意思即为

if condition条件成立

取 T

else

取 F

>>> X = df.iloc[0:100, [0,2]].values # 取前100元素只保留每个样本的（每行）第 1 和第 2 个特征

df.iloc[0:100, [0,2]]

0 2

0 5.1 1.4

1 4.9 1.4

2 4.7 1.3

3 4.6 1.5

4 5.0 1.4

5 5.4 1.7

...

>>> df.iloc[0:100, [0,2]].values

array([[5.1, 1.4],

[4.9, 1.4],

[4.7, 1.3],

[4.6, 1.5],

[5. , 1.4],

[5.4, 1.7],

[4.6, 1.4],

...

>>> plt.scatter(x[:50, 0], x[:50, 1],color='red', marker='o', label='setosa') # 绘制散点图

举个栗子：

In [75]:X1 = [1,2,3,4,5,6]

In [76]: Y1 = [1,2,3,4,5,6]

In [77]: plt.scatter(X1,Y1,color='yellow', marker='o',label='he')

Out[77]: <matplotlib.collections.PathCollection at 0x7fca65dd9278>

In [78]: plt.show()

散点图：

折线图：

X1，Y1别代表 X轴和Y轴上的点 color显示颜色， marker标记种类， label标记提示

>>> plt.scatter(x[50:100, 0], x[50:100, 1],color='blue', marker='x', label='versicolor')

>>> plt.xlabel('petal length') # x轴标签

>>> plt.ylabel('sepal length') # y轴标签

>>> plt.legend(loc='upper left') # 绘制图像坐落位置

>>> plt.show()

绘图结果：

上面是获取的部分鸢尾花的数据散点分布图

下面利用抽取出的鸢尾花数据子集进行训练感知器。

首先要讲下感知器模型的相关知识。

感知器模型是一种线性分类器，

在感知器法则中，罗森布拉特提出了一个自学习算法，此算法可以通过优化得到权重系数，此系数与输入值的乘积决定了神经元是否被激活。定义一个激励函数：，其中z称为净输入

罗森布拉特最初的感知器模型非常简单，

1）将权重初始化为 0 或一个极小的随机数。

2）迭代所有训练样本x^(i), 执行如下操作：

（1）计算输出值y^。

（2）更新权重

这里的输出值是指通过前面定义的单位阶跃函数预测得出的类标，而每次对权重向量中每一权重w的更新方式为：

对于用于更新的权重W_j的值，可通过感知器学习规则计算获得：其中，，为学习速率（0.0~1.0之间的常数），为第i个样本的真是类标，为预测得到的类标。需要注意：权重向量中的所有权重是同时更新的，这意味着在所有的权重更新前，我们无法重新计算。具体而言，对于一个二维数据集，可通过下式进行更新：

举个简单的栗子进行验证一下，若感知器类标预测准确，可通过如下式子进行更新：

有趣的是，在类标预测错误的情况下，权重的值会分别趋向于正类（+1）或者负类（-1）的方向：

另外，需要注意的是：感知器瘦脸有两个前提一是类个类别必须是线性可分的二是学习速率足够小。

下面使用Python实现感知器算法：

# -*- coding:utf-8 -*-

"""

感知器学习算法

"""

import numpy as np

class Perceptron(object):

'''

eta ：float

表示学习速率

n_iter:int

迭代次数

errors_:list

每轮迭代中错误分类样本的数量

'''

def __init__(self, eta = 0.01, n_iter=10):

self.eta = eta

self.n_iter = n_iter

def fit(self, X, y):

"""

拟合数据集进行参照公式的计算

"""

self.w_ = np.zeros(1 + X.shape[1]) # X 的列数即权重参考值＋1则是考虑w_0

self.errors_ = [] # 收集训练器做出的好坏判定

for _ in range(self.n_iter): # 循环次数

errors = 0 # 初始

for xi, target in zip(X, y):

update = self.eta * (target - self.predict(xi))

self.w_[1:] += update * xi

# 上面两步就是计算权重w的值 w_1~w_i

self.w_[0] += update

# 计算w_0的值

errors += int(update != 0.0)

# 用于收集每轮迭代中错误分类样本的数量以便后续对感知器在训练集中表现的好坏做出判定

self.errors_.append(errors)

def net_input(self, X):

"""计算Z = W^T * X"""

return np.dot(X, self.w_[1:]) + self.w_[0]

def predict(self, X):

"""返回类标签转换成二值输出巧妙地用np.where搞定"""

return np.where(self.net_input(X) >= 0, 1, -1)

“”“

补充关于zip

In [85]:X1

Out[85]: [1, 2, 3, 4, 5, 6]

In [86]: Y1

Out[86]: [1, 2, 3, 4, 5, 6]

In [87]: for i in zip(X1,Y1):

...: print(i)

...:

(1, 1)

(2, 2)

(3, 3)

(4, 4)

(5, 5)

(6, 6)

”“”

使用鸢尾数据测试感知器算法，利用抽取的鸢尾花数据来训练感知器，同时，绘制每次迭代的错误分类数量的折线图，以验证算法是否收敛并分开两种类型鸢尾花的决策边界：

>>> import pandas as pd

>>> import matplotlib.pyplot as plt

>>> from matplotlib.colors import ListedColormap

>>> import numpy as np

>>> ppn = Perceptron(eta=0.1, n_iter=10)

>>> ppn.fit(X, y)

>>> plt.plot(range(1, len(ppn.errors_ + 1), ppn.errors_, marker='o')

>>> plt.xlabel('Epochs')

>>> plt.ylabel('Number of minclassifications')

>>> plt.show()

每次迭代对应的错误分类数量，如下图：

“”“

补充： plot

x = [1,2,3,4,5,6,7,8,9,10] y = [2,2,3,2,1,0,0,0,0,0]

plt.plot(x,y, marker='o') # 画折线图

”“”

可以看出分类器在第6次迭代后就已经收敛，并且具备对训练样本进行正确分类的能力。然后在实现对二维数据集决策边界的可视化：

def plot_decision_regions(X, y, classifier, resolution=0.02):

# 设置标记说明列表和颜色列表供后续选择使用

markers = ('s', 'x', 'o', '^', 'v') # 图示散点图标记

colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan') # 颜色列表备选

cmap = ListedColormap(colors[:len(np.unique(y))]) # 颜色图用于填充等高线图区块

# 绘制决策边界

x1_min, x1_max = X[:, 0].min() -1, X[:, 0].max() + 1 # 取第一列最大小值

x2_min, x2_max = X[:, 1].min() -1, X[:, 1].max() + 1

xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),np.arange(x2_min, x2_max, resolution))

# 传入一维数组更改为坐标矩阵 meshgrid https:// www.cnblogs.com/sunshinewang/p/6897966.html

Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T) # 分类器感知器模型进行计算

Z= Z.reshape(xx1.shape) # 重塑新的形状但不改变数据

plt.contourf(xx1, xx2, Z, alpha=0.4, cmap = cmap) # 画三维等高线图并对区域进行颜色填充

plt.xlim(xx1.min(), xx1.max()) # Get or set the x limits of the current axes.

plt.ylim(xx2.min(), xx2.max())

# 绘制分类样本

for idx, cl in enumerate(np.unique(y)):

# 画出散点图

plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1], alpha=0.8, c = cmap(idx), marker=markers[idx], label=cl)

“”“

np.meshgrid

Z.reshape

In [121]: xx1

Out[121]:

array([[3.3 , 3.32, 3.34, ..., 7.94, 7.96, 7.98],

[3.3 , 3.32, 3.34, ..., 7.94, 7.96, 7.98],

...,

[3.3 , 3.32, 3.34, ..., 7.94, 7.96, 7.98],

[3.3 , 3.32, 3.34, ..., 7.94, 7.96, 7.98]])

In [122]: xx2

Out[122]:

array([[0. , 0. , 0. , ..., 0. , 0. , 0. ],

[0.02, 0.02, 0.02, ..., 0.02, 0.02, 0.02],

[0.04, 0.04, 0.04, ..., 0.04, 0.04, 0.04],

...,

[6.04, 6.04, 6.04, ..., 6.04, 6.04, 6.04],

[6.06, 6.06, 6.06, ..., 6.06, 6.06, 6.06],

[6.08, 6.08, 6.08, ..., 6.08, 6.08, 6.08]])

In [125]: Z.shape

Out[125]: (305, 235)

In [126]: Z

Out[126]:

array([[-1, -1, -1, ..., -1, -1, -1],

[-1, -1, -1, ..., -1, -1, -1],

...,

[ 1, 1, 1, ..., 1, 1, 1],

[ 1, 1, 1, ..., 1, 1, 1]])

”“”

>>> plot_decision_regions(X, y, classifier=ppn)

>>> plt.xlabel('sepal length [cm]')

>>> plt.ylabel('petal length [cm]')

>>> plt.legend(loc='upper left')

>>> plt.show()