感知机算法python实现

最新推荐文章于 2023-10-25 11:53:11 发布

想翻面的咸鱼

最新推荐文章于 2023-10-25 11:53:11 发布

阅读量3.2k

点赞数 6

文章标签： python 算法分类

本文链接：https://blog.csdn.net/sendiae/article/details/122109109

版权

感知机（perceptron）是二分类的线性分类模型，其中输入是实例的特征向量，输出是类别，类别取+1和-1二值。感知机的目标是求出一个超平面将训练数据进行线性划分。

下面基于鸢尾花数据进行实现感知机算法。

简单介绍下鸢尾花数据：iris数据集的中文名是安德森鸢尾花卉数据集，英文全称是Anderson’s Iris data set。iris数据集是一个150行5列的二维表。具体来说是150个样本，每个样本是数据集中的每行数据，每个样本有4个特征（前4列），1个标签（第五列）。4个特征分别是花萼长度、花萼宽度、花瓣长度、花瓣宽度，标签共有3个，分别是山鸢尾、变色鸢尾还是维吉尼亚鸢尾。

目标是建立一个分类器，通过样本的四个特征判断样本属于哪个品种。

1.首先查看数据集内容和详细信息：

from sklearn import datasets
iris = datasets.load_iris()

print(iris.data.shape) #数据集大小
print(iris.data[:5])  #查看数据集前5行
print(iris.target.shape)  #数据集标签大小
print(iris.target)       #查看数据集的标签

2.创建DataFrame，读取数据

import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
df = pd.DataFrame(iris.data,columns=iris.feature_names)  #iris.data数据集大小作为行索引，列是数据集的特征
df['label'] = iris.target  #增加label列，值是数据标签

在spyder中可以直接查看df的构成

3.直接实现感知机

import warnings
warnings.filterwarnings("ignore")

#查看数据集内容和详细信息
# from sklearn import datasets
# iris = datasets.load_iris()

# print(iris.data.shape)
# print(iris.data[:5])
# print(iris.target.shape)
# print(iris.target)

import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

iris = load_iris()
df = pd.DataFrame(iris.data,columns=iris.feature_names)  
df['label'] = iris.target

#选择4个特征作为训练
df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
plt.scatter(df[:50]['petal length'],df[:50]['petal width'],label='0')
plt.scatter(df[50:100]['petal length'],df[50:100]['petal width'],label='1')
plt.xlabel('petal length')
plt.ylabel('petal width')
plt.legend()
plt.show()

#实现感知机
#取2,3两列特征，-1表示标签
data = np.array(df.iloc[:100, [2, 3, -1]])
X = data[:,:-1]  #取出特征
y = data[:,-1]   #取出标签

y = np.array([1 if i==1 else -1 for i in y])   #y表示类别，正常取值是0和1，把0类别的值改为-1

class Model():
    def __init__(self):
        #一共训练两个特征需要2个权重参数，data数据是3列，第三列是数据标签
        self.w = np.ones(len(data[0])-1,dtype=np.float32) 
        self.b = 0
        self.l_rate = 0.1
        
    def sign(self,x,w,b):
        y = np.dot(x,w) + b
        return y
    #使用随机梯度进行拟合
    def fit(self,X_train,y_train):
        is_wrong = False
        while not is_wrong:
            wrong_count = 0
            for d in range(len(X_train)):
                X = X_train[d]
                y = y_train[d]
                #如果分类错误,就进行梯度更新
                if y*self.sign(X, self.w, self.b) <= 0:
                    self.w = self.w + self.l_rate * np.dot(y,X)
                    self.b = self.b + self.l_rate * y
                    wrong_count += 1
            
            if wrong_count == 0:  #如果全都分类正确,wrong_count值为0,则停止拟合
                is_wrong = True
        return 'Perceptron Model'
    
    def score(self):
        pass
    
#拟合
perceptron = Model()
perceptron.fit(X,y)

print('权重:', perceptron.w[0], perceptron.w[1])
print('偏置:', perceptron.b)
#可视化结果
x_points = np.linspace(-3, 7,10)
y_ = -(perceptron.w[0]*x_points + perceptron.b)/perceptron.w[1]
plt.plot(x_points, y_)  #绘制训练出的超平面

plt.plot(data[:50, 0], data[:50, 1], 'bo', color='blue', label='0')  #前50行，第0列是petal length
plt.plot(data[50:100, 0], data[50:100, 1], 'bo', color='orange', label='1')
plt.xlabel('petal length')
plt.ylabel('petal width')
plt.legend()

分类结果：

4.使用sklearn构造感知机

import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from sklearn.linear_model import Perceptron

iris = load_iris()
df = pd.DataFrame(iris.data,columns=iris.feature_names)  
df['label'] = iris.target

#选择4个特征作为训练
df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
plt.scatter(df[:50]['petal length'],df[:50]['petal width'],label='0')
plt.scatter(df[50:100]['petal length'],df[50:100]['petal width'],label='1')
plt.xlabel('petal length')
plt.ylabel('petal width')
plt.legend()
plt.show()

#实现感知机
#取2,3两列特征，-1表示标签
data = np.array(df.iloc[:100, [2, 3, -1]])
X = data[:,:-1]  #取出特征
y = data[:,-1]   #取出标签
y = np.array([1 if i==1 else -1 for i in y])   #y正常取值是0和1，把0类别的值改为-1

#fit_intercept参数：False状态截距是0，True状态截距学习得到，设置成False会出错

clf = Perceptron(fit_intercept=True, max_iter=100, shuffle=True)
clf.fit(X,y)

#输出权重,偏置
print('权重:',clf.coef_)
print('偏置:',clf.intercept_)

#可视化结果
x_points = np.linspace(-3, 7,10)
y_ = -(clf.coef_[0][0]*x_points + clf.intercept_)/clf.coef_[0][1]
plt.plot(x_points, y_)  #绘制训练出的超平面

plt.plot(data[:50, 0], data[:50, 1], 'bo', color='blue', label='0')  #前50行，第0列是petal length
plt.plot(data[50:100, 0], data[50:100, 1], 'bo', color='orange', label='1')
plt.xlabel('petal length')
plt.ylabel('petal width')
plt.legend()

分类结果：

5.结论

可以看出不管哪种实现感知机的方法，都可以很好的对数据集进行分类，但是学到的权重和偏置是不一样的。这是不是也可以体现感知机的一个重要特性，只要将类别分开就行，不在乎点到超平面的距离是否为最短。

想翻面的咸鱼

关注

6
点赞
踩
51

收藏

觉得还不错? 一键收藏
0
评论
感知机算法python实现

感知机（perceptron）是二分类的线性分类模型，其中输入是实例的特征向量，输出是类别，类别取+1和-1二值。感知机的目标是求出一个超平面将训练数据进行线性划分。下面基于鸢尾花数据进行实现感知机算法。简单介绍下鸢尾花数据：iris数据集的中文名是安德森鸢尾花卉数据集，英文全称是Anderson’s Iris data set。iris数据集是一个150行5列的二维表。具体来说是150个样本，每个样本是数据集中的每行数据，每个样本有4个特征（前4列），1个标签（第五列）。4个特征分别是花萼长度、花
复制链接

扫一扫