感知机(PLA)

最新推荐文章于 2021-11-07 16:52:31 发布

Just be better

最新推荐文章于 2021-11-07 16:52:31 发布

阅读量382

点赞数 1

分类专栏：机器学习文章标签：感知机线性分类随机梯度下降口袋算法 Python实现

本文链接：https://blog.csdn.net/qq_42854954/article/details/120908040

版权

机器学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

感知机(PLA)

一、基本推导

1. 输入： 实例的特征向量；
2. 输出： 实例的类别（两类，+1和-1）；
3. 模型： $\cdot x+b)$ ，参数为 $w, b$ ；
4. 应用的问题： 数据线性可分，即数据所有的正实例点和负实例点可以完全正确地划分到超平面（ $\cdot x+b=0$ ）两侧。
5. 学习策略： 确定损失函数，若将误分类点的个数作为损失函数，则不是参数 $w, b$ 的连续可导函数；因此考虑误分类点到超平面的总距离。
$\frac{1}{||w||}|w \cdot x+b| \tag{1}$
（其中 $w$ 可看做超平面的法向量，在三维空间中点到平面的距离公式为 $d=\frac{|Ax+By+Cz+D|}{ \sqrt{A^2+B^2+C^2}}$ ，这里 $(A, B, C)$ 就是平面 $A x + B y + C z + D = 0$ 的法向量，推广到多维空间中即为上式。）
可以发现，对于误分类的数据点来说 $-y(w\cdot x+b)>0$ 恒成立，由于 $y=\pm1$ ，所以 (1) 式可写为：
$-\frac{1}{||w||}y_i(w \cdot x_i +b)$
对于误分类点集M的总距离为：
$-\frac{1}{||w||}\sum_{x_i\in M}y_i(w \cdot x_i +b) \tag{2}$
不考虑 $\frac{1}{||w||}$ ，损失函数定义为下式，目标即为寻找合适的参数 $w, b$ 使得损失函数极小化。
$L(w,b)=-\sum_{x_i\in M}y_i(w\cdot x_i+b) \tag{3}$
6. 参数更新： 采用随机梯度下降算法。初始化一个 $w_0,b_0$ ，然后不断极小化损失函数。极小化过程不是一次使M中所有误分类点梯度下降，而是随机选取一个误分类点使其梯度下降。（其中 $\eta (0<\eta \leq1)$ 表示步长，又叫学习率。）
$\bigtriangledown_wL(w,b)=-\sum_{x_i\in M}y_ix_i \Rightarrow w:=w+\eta y_ix_i$
$\bigtriangledown_bL(w,b)=-\sum_{x_i\in M}y_i \Rightarrow b:=b+\eta y_i$
7. 所属类型： 监督学习、非概率模型、线性模型、参数化模型。
8. 收敛性： 由于训练数据集是线性可分的，所以必存在超平面（ $\hat{w}_{opt}\cdot x=w_{opt}\cdot x+b_{opt}=0$ ）将数据完全正确分开，则根据“Radius-Margin Bound”定理，误分类次数 $T\leq(\frac{R}{\gamma})^2$ ，其中， $R=\sqrt{m\underset{i}ax||x_i||^2}$ ， $\gamma=m\underset{i}in\frac{\hat{w}_{opt}\cdot x+b_{opt}}{||\hat{w}_{opt}||}$ 。可见，经过有限次搜索可以找到合适的超平面。
9. 特点： 感知机学习算法存在许多解，这些解依赖于初值的选择和迭代过程中误分类点的选择顺序。

算法流程

输入：训练数据集 $T={(x_1,y_1),(x_2,y_2),...,(x_N,y_N)}$ ，其中 $x_i\in X=R^n$ ， $y_i\in Y=\{-1,1\}$ ， $i = 1, 2 . . ., N$ ；
输出： $w, b$ ；
Step1：初始化 $w_0,b_0$ ；
Step2：在训练集中选取数据 ${x_i,y_i\}$ ；
Step3：如果 $y_i(w\cdot x_i)+b\leq 0$ ，更新参数 $w:=w+\eta y_ix_i$ ， $b:=b+\eta y_i$ ；
Step4：是否还有误分类点，若有，转至Step2；否则，结束。

Python代码

# -*- coding: utf-8 -*-
"""
使用说明：1. 输入输出数据直接在前几行修改；
         2. 初始参数可更改，尤其是权重，根据实际数据的维数更改；
         3. 画图仅限输入数据为2维或3维，其它维度无法可视化。
"""

# -*- coding: utf-8 -*-

import numpy as np
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D #画三维图的库
import csv
import time

data = np.loadtxt(open('data1.csv', 'r', encoding = 'utf8'), delimiter=",",skiprows=0)
data = data.tolist() #列表
random.shuffle(data)

x_train = []
y_train = []
x_test = []
y_test = []
for i in range(int(0.75*len(data))):
    x_train.append(data[i][0:2])
    y_train.append(data[i][2])
for i in range(int(0.75*len(data)),len(data)):
    x_test.append(data[i][0:2])
    y_test.append(data[i][2])
# 以上是数据导入部分

#设置初始参数
WEIGHT = [0, 0] #权重,维数增加的话直接在里边增加元素
BIAS = 0 #偏置
LR = 0.1 #学习率

# 训练和测试
weight = WEIGHT
bias = BIAS
learning_rate = LR

def sign(v):
    if v>=0: return 1
    else: return -1

def training():
    global weight
    global bias
    global learning_rate
    while 1:
        x = random.choice(x_train) #抽取x
        y = y_train[x_train.index(x)] #找到x对应的y
        x = np.array(x) #将x变为矩阵
        y_predict = sign(np.array(weight).dot(x)+bias)
        print("train data: ", x)
        print("true result: %d , predict result: %d" %(y, y_predict))
        if y*y_predict <= 0:
            weight = weight + learning_rate*y*x
            bias = bias + learning_rate*y
            print("updata weight:", weight)
            print("update bias:", bias)
        num = 0
        for i in range(len(x_train)):
            if sign(np.array(weight).dot(x_train[i]) + bias) == y_train[i]:
               num += 1
        if num == len(x_train):break
    print("======stop training=====, the weight and bias are:", weight, bias)
    return weight, bias

def test():
    global x_test
    weight, bias = training()
    y_predict = [0]*len(x_test)
    right_num = 0
    for i in range(len(x_test)):
        y_predict[i] = sign(np.array(weight).dot(np.array(x_test[i])) + bias)
        if y_predict[i] == y_test[i]:
            right_num += 1
        print("test data:", x_test[i])
        print("y_predict:", y_predict[i])
    right_rate = right_num/len(y_test)
    print("right rate is: ", right_rate)
    # 画图，根据输入数据的维数选择合适的图，超过3维无法可视化
    # 二维图
    x_test = np.array(x_test)
    # 测试集的点
    for i in range(len(x_test)):
        if y_test[i] == 1:
            plt.plot(x_test[i][0], x_test[i][1], 'ro')
        else:
            plt.plot(x_test[i][0], x_test[i][1], 'bo')
    # 二维空间中的线
    X = np.arange(round(min(x_test[:,0]))-2, round(max(x_test[:,0]))+2)
    Y = (-weight[0]*X-bias)/weight[1]
    plt.plot(X, Y)
    plt.show()
    
    # # 三维图
    # x_test = np.array(x_test)
    # fig = plt.figure()
    # ax = Axes3D(fig)
    # # 测试集的点
    # X = np.arange(round(min(x_test[:,0]))-2, round(max(x_test[:,0]))+2)
    # Y = np.arange(min(x_test[:,1])-2, max(x_test[:,1])+2)
    # X, Y = np.meshgrid(X, Y)
    # Z = (-weight[0]*X-weight[1]*Y-bias)/weight[2]
    # # 三位空间中的平面
    # for i in range(len(x_test)):
    #     if y_test[i] == 1:
    #         plt.plot(x_test[i][0], x_test[i][1], x_test[i][2], 'ro')
    #     else:
    #         plt.plot(x_test[i][0], x_test[i][1], x_test[i][2], 'bo')
    # ax.plot_surface(X, Y, Z)
    # plt.show()
    
if __name__ == "__main__":
    time_start = time.time()
    test()
    time_end = time.time()
    print('running time is:', time_end - time_start)

可视化：
在这里插入图片描述

在这里插入图片描述

改进：口袋算法

对于线性不可分数据，使用PLA会一直达不到终止条件，因此可使用口袋算法，允许一定的容错率，将每次当前最优的权重和偏置存在口袋（Pocket）中。下一次待更新时，若新参数的正确率更高，则更新参数；否则不更新。而且这时的迭代终止条件不是没有误分类的点，而是到达最大迭代次数。

Python代码

# -*- coding: utf-8 -*-
"""
使用说明：1. 输入输出数据直接在前几行修改；
         2. 初始参数可更改，尤其是权重，根据实际数据的维数更改；
         3. 画图仅限输入数据为2维或3维，其它维度无法可视化。
"""

import numpy as np
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D #画三维图的库
import csv
import time

data = np.loadtxt(open('data2.csv', 'r', encoding = 'utf8'), delimiter=",",skiprows=0)
data = data.tolist() #列表
random.shuffle(data) #打乱数据

x_train = []
y_train = []
x_test = []
y_test = []
for i in range(int(0.75*len(data))):
    x_train.append(data[i][0:2])
    y_train.append(data[i][2])
for i in range(int(0.75*len(data)),len(data)):
    x_test.append(data[i][0:2])
    y_test.append(data[i][2])

#设置初始参数
WEIGHT = [0, 0] #权重,维数增加的话直接在里边增加元素
BIAS = 0 #偏置
LR = 0.1 #学习率
NUM = 100 #迭代次数

# 训练和测试
weight = WEIGHT
bias = BIAS
learning_rate = LR

def sign(v):
    if v>=0: return 1
    else: return -1

def training():
    global weight
    global bias
    global learning_rate
    pocket = []
    pocket.append(weight[0])
    pocket.append(weight[1])
    pocket.append(bias)
    max_num = 0
    for i in range(NUM):
        x = random.choice(x_train) #抽取x
        y = y_train[x_train.index(x)] #找到x对应的y
        x = np.array(x) #将x变为矩阵
        y_predict = sign(np.array(weight).dot(x)+bias)
        print("train data: ", x)
        print("true result: %d , predict result: %d" %(y, y_predict))
        if y*y_predict <= 0:
            weight = weight + learning_rate*y*x
            bias = bias + learning_rate*y
            print("updata weight:", weight)
            print("update bias:", bias)
            num = 0
            for i in range(len(x_train)):
                if sign(np.array(weight).dot(x_train[i]) + bias) == y_train[i]:
                    num += 1
            if num > max_num: 
                max_num = num
                pocket.append(weight[0])
                pocket.append(weight[1])
                pocket.append(bias)
    print("======stop training=====, the weight and bias are:", pocket[-3:-1], pocket[-1])
    
    return pocket[-3:-1], pocket[-1]

def test():
    global x_test
    weight, bias = training()
    y_predict = [0]*len(x_test)
    right_num = 0
    for i in range(len(x_test)):
        y_predict[i] = sign(np.array(weight).dot(np.array(x_test[i])) + bias)
        if y_predict[i] == y_test[i]:
            right_num += 1
        print("test data:", x_test[i])
        print("y_predict:", y_predict[i])
    right_rate = right_num/len(y_test)
    print("right rate is: ", right_rate)
    
    # 画图，根据输入数据的维数选择合适的图，超过3维无法可视化
    # 二维图
    x_test = np.array(x_test)
    # 测试集的点
    for i in range(len(x_test)):
        if y_test[i] == 1:
            plt.plot(x_test[i][0], x_test[i][1], 'ro')
        else:
            plt.plot(x_test[i][0], x_test[i][1], 'bo')
    # 二维空间中的线
    X = np.arange(round(min(x_test[:,0]))-2, round(max(x_test[:,0]))+2)
    Y = (-weight[0]*X-bias)/weight[1]
    plt.plot(X, Y)
    plt.show()
    
    # # 三维图
    # x_test = np.array(x_test)
    # fig = plt.figure()
    # ax = Axes3D(fig)
    # # 测试集的点
    # X = np.arange(round(min(x_test[:,0]))-2, round(max(x_test[:,0]))+2)
    # Y = np.arange(min(x_test[:,1])-2, max(x_test[:,1])+2)
    # X, Y = np.meshgrid(X, Y)
    # Z = (-weight[0]*X-weight[1]*Y-bias)/weight[2]
    # # 三位空间中的平面
    # for i in range(len(x_test)):
    #     if y_test[i] == 1:
    #         plt.plot(x_test[i][0], x_test[i][1], x_test[i][2], 'ro')
    #     else:
    #         plt.plot(x_test[i][0], x_test[i][1], x_test[i][2], 'bo')
    # ax.plot_surface(X, Y, Z)
    # plt.show()
    
if __name__ == "__main__":
    time_start = time.time()
    test()
    time_end = time.time()
    print('running time is:', time_end - time_start)