感知机(PLA)
一、基本推导
1. 输入: 实例的特征向量;
2. 输出: 实例的类别(两类,+1和-1);
3. 模型:
f
(
x
)
=
s
i
g
n
(
w
⋅
x
+
b
)
f(x)=sign(w \cdot x+b)
f(x)=sign(w⋅x+b),参数为
w
,
b
w,b
w,b;
4. 应用的问题: 数据线性可分,即数据所有的正实例点和负实例点可以完全正确地划分到超平面(
w
⋅
x
+
b
=
0
w \cdot x+b=0
w⋅x+b=0)两侧。
5. 学习策略: 确定损失函数,若将误分类点的个数作为损失函数,则不是参数
w
,
b
w,b
w,b的连续可导函数;因此考虑误分类点到超平面的总距离。
1
∣
∣
w
∣
∣
∣
w
⋅
x
+
b
∣
(1)
\frac{1}{||w||}|w \cdot x+b| \tag{1}
∣∣w∣∣1∣w⋅x+b∣(1)
(其中
w
w
w可看做超平面的法向量,在三维空间中点到平面的距离公式为
d
=
∣
A
x
+
B
y
+
C
z
+
D
∣
A
2
+
B
2
+
C
2
d=\frac{|Ax+By+Cz+D|}{ \sqrt{A^2+B^2+C^2}}
d=A2+B2+C2∣Ax+By+Cz+D∣,这里
(
A
,
B
,
C
)
(A,B,C)
(A,B,C)就是平面
A
x
+
B
y
+
C
z
+
D
=
0
Ax+By+Cz+D=0
Ax+By+Cz+D=0的法向量,推广到多维空间中即为上式。)
可以发现,对于误分类的数据点来说
−
y
(
w
⋅
x
+
b
)
>
0
-y(w\cdot x+b)>0
−y(w⋅x+b)>0恒成立,由于
y
=
±
1
y=\pm1
y=±1,所以 (1) 式可写为:
−
1
∣
∣
w
∣
∣
y
i
(
w
⋅
x
i
+
b
)
-\frac{1}{||w||}y_i(w \cdot x_i +b)
−∣∣w∣∣1yi(w⋅xi+b)
对于误分类点集M的总距离为:
−
1
∣
∣
w
∣
∣
∑
x
i
∈
M
y
i
(
w
⋅
x
i
+
b
)
(2)
-\frac{1}{||w||}\sum_{x_i\in M}y_i(w \cdot x_i +b) \tag{2}
−∣∣w∣∣1xi∈M∑yi(w⋅xi+b)(2)
不考虑
1
∣
∣
w
∣
∣
\frac{1}{||w||}
∣∣w∣∣1,损失函数定义为下式,目标即为寻找合适的参数
w
,
b
w,b
w,b使得损失函数极小化。
L
(
w
,
b
)
=
−
∑
x
i
∈
M
y
i
(
w
⋅
x
i
+
b
)
(3)
L(w,b)=-\sum_{x_i\in M}y_i(w\cdot x_i+b) \tag{3}
L(w,b)=−xi∈M∑yi(w⋅xi+b)(3)
6. 参数更新: 采用随机梯度下降算法。初始化一个
w
0
,
b
0
w_0,b_0
w0,b0,然后不断极小化损失函数。极小化过程不是一次使M中所有误分类点梯度下降,而是随机选取一个误分类点使其梯度下降。(其中
η
(
0
<
η
≤
1
)
\eta (0<\eta \leq1)
η(0<η≤1)表示步长,又叫学习率。)
▽
w
L
(
w
,
b
)
=
−
∑
x
i
∈
M
y
i
x
i
⇒
w
:
=
w
+
η
y
i
x
i
\bigtriangledown_wL(w,b)=-\sum_{x_i\in M}y_ix_i \Rightarrow w:=w+\eta y_ix_i
▽wL(w,b)=−xi∈M∑yixi⇒w:=w+ηyixi
▽
b
L
(
w
,
b
)
=
−
∑
x
i
∈
M
y
i
⇒
b
:
=
b
+
η
y
i
\bigtriangledown_bL(w,b)=-\sum_{x_i\in M}y_i \Rightarrow b:=b+\eta y_i
▽bL(w,b)=−xi∈M∑yi⇒b:=b+ηyi
7. 所属类型: 监督学习、非概率模型、线性模型、参数化模型。
8. 收敛性: 由于训练数据集是线性可分的,所以必存在超平面(
w
^
o
p
t
⋅
x
=
w
o
p
t
⋅
x
+
b
o
p
t
=
0
\hat{w}_{opt}\cdot x=w_{opt}\cdot x+b_{opt}=0
w^opt⋅x=wopt⋅x+bopt=0)将数据完全正确分开,则根据“Radius-Margin Bound”定理,误分类次数
T
≤
(
R
γ
)
2
T\leq(\frac{R}{\gamma})^2
T≤(γR)2,其中,
R
=
m
a
i
x
∣
∣
x
i
∣
∣
2
R=\sqrt{m\underset{i}ax||x_i||^2}
R=miax∣∣xi∣∣2,
γ
=
m
i
i
n
w
^
o
p
t
⋅
x
+
b
o
p
t
∣
∣
w
^
o
p
t
∣
∣
\gamma=m\underset{i}in\frac{\hat{w}_{opt}\cdot x+b_{opt}}{||\hat{w}_{opt}||}
γ=miin∣∣w^opt∣∣w^opt⋅x+bopt。可见,经过有限次搜索可以找到合适的超平面。
9. 特点: 感知机学习算法存在许多解,这些解依赖于初值的选择和迭代过程中误分类点的选择顺序。
算法流程
输入:训练数据集
T
=
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
N
,
y
N
)
T={(x_1,y_1),(x_2,y_2),...,(x_N,y_N)}
T=(x1,y1),(x2,y2),...,(xN,yN),其中
x
i
∈
X
=
R
n
x_i\in X=R^n
xi∈X=Rn,
y
i
∈
Y
=
{
−
1
,
1
}
y_i\in Y=\{-1,1\}
yi∈Y={−1,1},
i
=
1
,
2...
,
N
i=1,2...,N
i=1,2...,N;
输出:
w
,
b
w,b
w,b;
Step1:初始化
w
0
,
b
0
w_0,b_0
w0,b0;
Step2:在训练集中选取数据
{
x
i
,
y
i
}
\{x_i,y_i\}
{xi,yi};
Step3:如果
y
i
(
w
⋅
x
i
)
+
b
≤
0
y_i(w\cdot x_i)+b\leq 0
yi(w⋅xi)+b≤0,更新参数
w
:
=
w
+
η
y
i
x
i
w:=w+\eta y_ix_i
w:=w+ηyixi,
b
:
=
b
+
η
y
i
b:=b+\eta y_i
b:=b+ηyi;
Step4:是否还有误分类点,若有,转至Step2;否则,结束。
Python代码
# -*- coding: utf-8 -*-
"""
使用说明:1. 输入输出数据直接在前几行修改;
2. 初始参数可更改,尤其是权重,根据实际数据的维数更改;
3. 画图仅限输入数据为2维或3维,其它维度无法可视化。
"""
# -*- coding: utf-8 -*-
import numpy as np
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D #画三维图的库
import csv
import time
data = np.loadtxt(open('data1.csv', 'r', encoding = 'utf8'), delimiter=",",skiprows=0)
data = data.tolist() #列表
random.shuffle(data)
x_train = []
y_train = []
x_test = []
y_test = []
for i in range(int(0.75*len(data))):
x_train.append(data[i][0:2])
y_train.append(data[i][2])
for i in range(int(0.75*len(data)),len(data)):
x_test.append(data[i][0:2])
y_test.append(data[i][2])
# 以上是数据导入部分
#设置初始参数
WEIGHT = [0, 0] #权重,维数增加的话直接在里边增加元素
BIAS = 0 #偏置
LR = 0.1 #学习率
# 训练和测试
weight = WEIGHT
bias = BIAS
learning_rate = LR
def sign(v):
if v>=0: return 1
else: return -1
def training():
global weight
global bias
global learning_rate
while 1:
x = random.choice(x_train) #抽取x
y = y_train[x_train.index(x)] #找到x对应的y
x = np.array(x) #将x变为矩阵
y_predict = sign(np.array(weight).dot(x)+bias)
print("train data: ", x)
print("true result: %d , predict result: %d" %(y, y_predict))
if y*y_predict <= 0:
weight = weight + learning_rate*y*x
bias = bias + learning_rate*y
print("updata weight:", weight)
print("update bias:", bias)
num = 0
for i in range(len(x_train)):
if sign(np.array(weight).dot(x_train[i]) + bias) == y_train[i]:
num += 1
if num == len(x_train):break
print("======stop training=====, the weight and bias are:", weight, bias)
return weight, bias
def test():
global x_test
weight, bias = training()
y_predict = [0]*len(x_test)
right_num = 0
for i in range(len(x_test)):
y_predict[i] = sign(np.array(weight).dot(np.array(x_test[i])) + bias)
if y_predict[i] == y_test[i]:
right_num += 1
print("test data:", x_test[i])
print("y_predict:", y_predict[i])
right_rate = right_num/len(y_test)
print("right rate is: ", right_rate)
# 画图,根据输入数据的维数选择合适的图,超过3维无法可视化
# 二维图
x_test = np.array(x_test)
# 测试集的点
for i in range(len(x_test)):
if y_test[i] == 1:
plt.plot(x_test[i][0], x_test[i][1], 'ro')
else:
plt.plot(x_test[i][0], x_test[i][1], 'bo')
# 二维空间中的线
X = np.arange(round(min(x_test[:,0]))-2, round(max(x_test[:,0]))+2)
Y = (-weight[0]*X-bias)/weight[1]
plt.plot(X, Y)
plt.show()
# # 三维图
# x_test = np.array(x_test)
# fig = plt.figure()
# ax = Axes3D(fig)
# # 测试集的点
# X = np.arange(round(min(x_test[:,0]))-2, round(max(x_test[:,0]))+2)
# Y = np.arange(min(x_test[:,1])-2, max(x_test[:,1])+2)
# X, Y = np.meshgrid(X, Y)
# Z = (-weight[0]*X-weight[1]*Y-bias)/weight[2]
# # 三位空间中的平面
# for i in range(len(x_test)):
# if y_test[i] == 1:
# plt.plot(x_test[i][0], x_test[i][1], x_test[i][2], 'ro')
# else:
# plt.plot(x_test[i][0], x_test[i][1], x_test[i][2], 'bo')
# ax.plot_surface(X, Y, Z)
# plt.show()
if __name__ == "__main__":
time_start = time.time()
test()
time_end = time.time()
print('running time is:', time_end - time_start)
可视化:
改进:口袋算法
对于线性不可分数据,使用PLA会一直达不到终止条件,因此可使用口袋算法,允许一定的容错率,将每次当前最优的权重和偏置存在口袋(Pocket)中。下一次待更新时,若新参数的正确率更高,则更新参数;否则不更新。而且这时的迭代终止条件不是没有误分类的点,而是到达最大迭代次数。
Python代码
# -*- coding: utf-8 -*-
"""
使用说明:1. 输入输出数据直接在前几行修改;
2. 初始参数可更改,尤其是权重,根据实际数据的维数更改;
3. 画图仅限输入数据为2维或3维,其它维度无法可视化。
"""
import numpy as np
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D #画三维图的库
import csv
import time
data = np.loadtxt(open('data2.csv', 'r', encoding = 'utf8'), delimiter=",",skiprows=0)
data = data.tolist() #列表
random.shuffle(data) #打乱数据
x_train = []
y_train = []
x_test = []
y_test = []
for i in range(int(0.75*len(data))):
x_train.append(data[i][0:2])
y_train.append(data[i][2])
for i in range(int(0.75*len(data)),len(data)):
x_test.append(data[i][0:2])
y_test.append(data[i][2])
#设置初始参数
WEIGHT = [0, 0] #权重,维数增加的话直接在里边增加元素
BIAS = 0 #偏置
LR = 0.1 #学习率
NUM = 100 #迭代次数
# 训练和测试
weight = WEIGHT
bias = BIAS
learning_rate = LR
def sign(v):
if v>=0: return 1
else: return -1
def training():
global weight
global bias
global learning_rate
pocket = []
pocket.append(weight[0])
pocket.append(weight[1])
pocket.append(bias)
max_num = 0
for i in range(NUM):
x = random.choice(x_train) #抽取x
y = y_train[x_train.index(x)] #找到x对应的y
x = np.array(x) #将x变为矩阵
y_predict = sign(np.array(weight).dot(x)+bias)
print("train data: ", x)
print("true result: %d , predict result: %d" %(y, y_predict))
if y*y_predict <= 0:
weight = weight + learning_rate*y*x
bias = bias + learning_rate*y
print("updata weight:", weight)
print("update bias:", bias)
num = 0
for i in range(len(x_train)):
if sign(np.array(weight).dot(x_train[i]) + bias) == y_train[i]:
num += 1
if num > max_num:
max_num = num
pocket.append(weight[0])
pocket.append(weight[1])
pocket.append(bias)
print("======stop training=====, the weight and bias are:", pocket[-3:-1], pocket[-1])
return pocket[-3:-1], pocket[-1]
def test():
global x_test
weight, bias = training()
y_predict = [0]*len(x_test)
right_num = 0
for i in range(len(x_test)):
y_predict[i] = sign(np.array(weight).dot(np.array(x_test[i])) + bias)
if y_predict[i] == y_test[i]:
right_num += 1
print("test data:", x_test[i])
print("y_predict:", y_predict[i])
right_rate = right_num/len(y_test)
print("right rate is: ", right_rate)
# 画图,根据输入数据的维数选择合适的图,超过3维无法可视化
# 二维图
x_test = np.array(x_test)
# 测试集的点
for i in range(len(x_test)):
if y_test[i] == 1:
plt.plot(x_test[i][0], x_test[i][1], 'ro')
else:
plt.plot(x_test[i][0], x_test[i][1], 'bo')
# 二维空间中的线
X = np.arange(round(min(x_test[:,0]))-2, round(max(x_test[:,0]))+2)
Y = (-weight[0]*X-bias)/weight[1]
plt.plot(X, Y)
plt.show()
# # 三维图
# x_test = np.array(x_test)
# fig = plt.figure()
# ax = Axes3D(fig)
# # 测试集的点
# X = np.arange(round(min(x_test[:,0]))-2, round(max(x_test[:,0]))+2)
# Y = np.arange(min(x_test[:,1])-2, max(x_test[:,1])+2)
# X, Y = np.meshgrid(X, Y)
# Z = (-weight[0]*X-weight[1]*Y-bias)/weight[2]
# # 三位空间中的平面
# for i in range(len(x_test)):
# if y_test[i] == 1:
# plt.plot(x_test[i][0], x_test[i][1], x_test[i][2], 'ro')
# else:
# plt.plot(x_test[i][0], x_test[i][1], x_test[i][2], 'bo')
# ax.plot_surface(X, Y, Z)
# plt.show()
if __name__ == "__main__":
time_start = time.time()
test()
time_end = time.time()
print('running time is:', time_end - time_start)
可视化: