去年花了4个月把吴恩达的深度学习课程学完,在深度学习方面略有了解,但对机器学习方面了解的不多,没有系统性的学习过,朋友推荐了两本书,李航的统计学习方法和周志华的西瓜书,两本超级经典介绍机器学习的教材,就在网上买了这两本书。书买了有两个月,现在开始学习李航的统计学习方法,朋友说这个更容易下手。还是按之前学习深度学习的方法:
- 看书,做笔记
- 做练习
- 整理到博客
感知机:二分类的线性分类模型,输入为实例的特征向量,输出为实例的类别,取+1和-1二值。
知识点梳理:
- 1、当训练数据集是线性可分时,感知机学习算法原始形式迭代是收敛的。收敛性的证明可以推导试试,如果没有这个思路完全让证明其收敛难度可想是非常大的。当训练集线性不可分时,感知机学习算法不收敛,迭代结果会发生震荡。
- 2、感知机在计算wx+b这条线的时候,已经在进行了转换,使得用于划分的直线变成x轴,左右侧分别为x轴的上方和下方,也就成了正和负。
- 3、因为做过转换,整张图旋转后wx+b是x轴,那么所有点到x轴的距离其实就是wx+b的值。考虑到x轴下方的点,得加上绝对值|wx+b|,求所有误分类点的距离和,也就是求|wx+b|的总和,让它最小化。很简单啊,把w和b等比例缩小就好啦,比如说w改为0.5w,b改为0.5b,线还是那条线,但是值缩小两倍啦!你还不满意?我可以接着缩!缩到0去!所以啊,我们要加点约束,让整个式子除以w的模长。啥意思?就是w不管怎么样,要除以它的单位长度。如果我w和b等比例缩小,那||w||也会等比例缩小,值一动不动,很稳。没有除以模长之前,|wx+b|叫函数间隔,除模长之后叫几何间隔,几何间隔可以认为是物理意义上的实际长度,管你怎么放大缩小,你物理距离就那样,不可能改个数就变。在机器学习中求距离时,通常是使用几何间隔的,否则无法求出解。
- 4、感知机对偶形式暂没发现有什么特别之处,就是提前计算出Gram矩阵能提高计算速度。
- 5、感知机不能表示异或,见博客。不光感知机无法处理异或问题,所有的线性分类模型都无法处理异或分类问题。
对于知识点,可以看书上的介绍,这一章知识点不是很难,这篇博客里面介绍的挺不错的感知机。2,3观点来自于这篇博客
本次练习使用的数据集为MINIST,就是手写数字灰度图片,图片像素为28*28,训练集60000,测试集10000。MNIST详细介绍
1、加载数据
数据已经从官网下载到本地,并已解压。
import numpy as np
import os
# 训练集
with open('./minist_data/train-images.idx3-ubyte') as f:
loaded = np.fromfile(file = f, dtype = np.uint8)
X_train = loaded[16:].reshape((60000, 784))
X_train = X_train.astype(np.int32)
print('X_train:',X_train.shape) # (60000, 784)
with open('./minist_data/train-labels.idx1-ubyte') as f:
loaded = np.fromfile(file = f, dtype = np.uint8)
y_train = loaded[8:]
y_train = y_train.astype(np.int32)
print('y_train:',y_train.shape) # (60000,)
# 测试集
with open('./minist_data/t10k-images.idx3-ubyte') as f:
loaded = np.fromfile(file=f, dtype=np.uint8)
X_test = loaded[16:].reshape((10000, 784))
X_test = X_test.astype(np.int32)
print('X_test:',X_test.shape) # (10000, 784)
with open('./minist_data/t10k-labels.idx1-ubyte') as f:
loaded = np.fromfile(file=f, dtype=np.uint8)
y_test = loaded[8:].reshape((10000))
y_test = y_test.astype(np.int32)
print('y_test:',y_test.shape) # (10000,)
X_train: (60000, 784)
y_train: (60000,)
X_test: (10000, 784)
y_test: (10000,)
2、查看数据
查看数据的类型,形状,具体内容,图形展示
import matplotlib.pyplot as plt
%matplotlib inline
print (type(X_train),type(y_train))
print (X_train[0])
print (y_train[0])
img = X_train[0].reshape(28, 28)
plt.imshow(img, cmap='Greys', interpolation='nearest')
plt.show();
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 3 18 18 18 126 136 175 26 166 255
247 127 0 0 0 0 0 0 0 0 0 0 0 0 30 36 94 154
170 253 253 253 253 253 225 172 253 242 195 64 0 0 0 0 0 0
0 0 0 0 0 49 238 253 253 253 253 253 253 253 253 251 93 82
82 56 39 0 0 0 0 0 0 0 0 0 0 0 0 18 219 253
253 253 253 253 198 182 247 241 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 80 156 107 253 253 205 11 0 43 154
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 14 1 154 253 90 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 139 253 190 2 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 11 190 253 70 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 241
225 160 108 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 81 240 253 253 119 25 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 45 186 253 253 150 27 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 16 93 252 253 187
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 249 253 249 64 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 46 130 183 253
253 207 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 39 148 229 253 253 253 250 182 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 24 114 221 253 253 253
253 201 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 23 66 213 253 253 253 253 198 81 2 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 18 171 219 253 253 253 253 195
80 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
55 172 226 253 253 253 253 244 133 11 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 136 253 253 253 212 135 132 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
5
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(
nrows=2,
ncols=5,
sharex=True,
sharey=True, )
ax = ax.flatten()
for i in range(10):
img = X_train[y_train == i][0].reshape(28, 28)
ax[i].imshow(img, cmap='Greys', interpolation='nearest')
ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(
nrows=5,
ncols=5,
sharex=True,
sharey=True, )
ax = ax.flatten()
for i in range(25):
img = X_train[y_train == 7][i].reshape(28, 28)
ax[i].imshow(img, cmap='Greys', interpolation='nearest')
ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
plt.show()
3、选取标签为0和1进行感知机二分类练习
# 数字0 和数字1二分类
X_train1 = X_train[y_train == 0]
X_train2 = X_train[y_train == 1]
print (X_train1.shape)
print (X_train2.shape)
X_train12 = np.vstack([X_train1,X_train2])
print (X_train12.shape)
y12 = np.ones((X_train12.shape[0],1))
y12[:X_train1.shape[0],:] = -1
all_data = np.hstack([X_train12,y12])
print (all_data.shape)
print ('*'*30)
fig, ax = plt.subplots(
nrows=2,
ncols=5,
sharex=True,
sharey=True, )
ax = ax.flatten()
for i in range(10):
img = X_train1[i].reshape(28, 28)
ax[i].imshow(img, cmap='Greys', interpolation='nearest')
(5923, 784)
(6742, 784)
(12665, 784)
(12665, 785)
******************************
# 随机打乱数据,选取前80%作为训练集,后面20%作为测试集
np.random.shuffle(all_data)
XX_train = all_data[:int(0.8*all_data.shape[0]),:-1]
yy_train = all_data[:int(0.8*all_data.shape[0]),-1]
XX_test = all_data[int(0.8*all_data.shape[0]):,:-1]
yy_test = all_data[int(0.8*all_data.shape[0]):,-1]
print (XX_train.shape)
print (yy_train.shape)
print (XX_test.shape)
print (yy_test.shape)
print (yy_test[:10])
(10132, 784)
(10132,)
(2533, 784)
(2533,)
[-1. -1. -1. -1. 1. -1. 1. 1. -1. -1.]
感知机:
模型:感知机模型的假设空间是定义在特征空间中的所有线性分类模型,即函数集合 { f ∣ f ( x ) = w x + b } \{f\space|f(x) = wx + b\} {f ∣f(x)=wx+b}
策略:计算误分类点到超平面S的总距离,假设超平面S的误分类点集合为M。
m
i
n
L
(
w
,
b
)
=
−
∑
x
i
∈
M
y
i
(
w
⋅
x
i
+
b
)
min\space L(w,b) = -\sum_{x_i\in M}y_i(w\cdot x_i + b)
min L(w,b)=−xi∈M∑yi(w⋅xi+b)
∇
w
L
(
w
,
b
)
=
−
∑
x
i
∈
M
y
i
⋅
x
i
\nabla_wL(w,b) = -\sum_{x_i\in M}y_i\cdot x_i
∇wL(w,b)=−xi∈M∑yi⋅xi
∇
b
L
(
w
,
b
)
=
−
∑
x
i
∈
M
y
i
\nabla_bL(w,b) = -\sum_{x_i\in M}y_i
∇bL(w,b)=−xi∈M∑yi
算法:损失函数极小化,随机梯度下降法,
r
r
r为学习率
y
i
(
w
⋅
x
i
+
b
)
<
=
0
y_i(w\cdot x_i + b) <= 0
yi(w⋅xi+b)<=0
w
=
w
+
r
×
y
i
⋅
x
i
w = w + r \times y_i\cdot x_i
w=w+r×yi⋅xi
b
=
b
+
r
×
y
i
b = b + r \times y_i
b=b+r×yi
import time
def perceptron_pars (X_train,y_train,num_iterations,r):
n,m = X_train.shape
w = np.zeros((1,m))
b = 0
for i in range(num_iterations):
for j in range(n):
if y_train[j]*(np.dot(w,X_train[j].T) + b) <= 0:
w = w + r*y_train[j]*X_train[j]
b = b + r*y_train[j]
if not i%(num_iterations/10):
print ('Run Percent:',float(i/num_iterations))
return w,b
def cal_accuracy(X_test,y_test,w,b):
n,m = X_test.shape
right_list = []
for j in range(n):
if y_test[j]*(np.dot(w,X_test[j].T) + b) >= 0:
right_list.append(1)
accruRate = len(right_list)/n
return accruRate
def model(X_train,y_train,X_test,y_test,num_iterations,r):
start = time.time()
w,b = perceptron_pars (X_train,y_train,num_iterations,r)
train_accruRate = cal_accuracy(X_train,y_train,w,b)
test_accruRate = cal_accuracy(X_test,y_test,w,b)
end = time.time()
print ('train_accruRate:',train_accruRate)
print ('test_accruRate:',test_accruRate)
print ('Run time :',(end-start))
return w,b
num_iterations = 30
w,b = model(XX_train,yy_train,XX_test,yy_test,num_iterations,r = 0.0001)
Run Percent: 0.0
Run Percent: 0.1
Run Percent: 0.2
Run Percent: 0.3
Run Percent: 0.4
Run Percent: 0.5
Run Percent: 0.6
Run Percent: 0.7
Run Percent: 0.8
Run Percent: 0.9
train_accruRate: 1.0
test_accruRate: 0.9980260560600079
Run time : 3.405052661895752
4、把训练集里面标签0-4作为一类,5-9作为一类进行二分类练习
# 这样划分的训练数据迭代1000次的错误率还有0.4,很可能不能线性可分
labels_train = []
labels_test = []
for i in range(len(y_train)):
if y_train[i] >= 5:
labels_train.append(1)
else :
labels_train.append(-1)
for i in range(len(y_test)):
if y_test[i] >= 5:
labels_test.append(1)
else :
labels_test.append(-1)
print (labels_train[:5])
print (labels_test[:5])
num_iterations = 30
w,b = model(X_train,labels_train,X_test,labels_test,num_iterations,r = 0.0001)
[1, -1, -1, -1, 1]
[1, -1, -1, -1, -1]
Run Percent: 0.0
Run Percent: 0.1
Run Percent: 0.2
Run Percent: 0.3
Run Percent: 0.4
Run Percent: 0.5
Run Percent: 0.6
Run Percent: 0.7
Run Percent: 0.8
Run Percent: 0.9
train_accruRate: 0.8119166666666666
test_accruRate: 0.8029
Run time : 19.708739757537842
5、构造数据进行感知机练习
import numpy as np
np.random.seed(2)
data1 = np.random.uniform(2,10,(2,50))
x2 = np.random.uniform(-8,-2,50)
y2 = np.random.uniform(-4,4,50)
data2 = np.vstack([x2,y2])
plt.figure(figsize=(10,6), dpi=80)
plt.scatter(data1[0], data1[1],alpha=0.5)
plt.scatter(data2[0], data2[1],alpha=0.5)
# 移动脊柱
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data',0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))
plt.show();
X_train = np.hstack([data1,data2]).T
print (X_train.shape)
y_train = np.ones((1,100))
y_train[:,50:] = -1
y_train = y_train.flatten()
print (y_train)
num_iterations = 10
w,b = model(X_train,y_train,X_train,y_train,num_iterations,r = 0.1)
(100, 2)
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. -1. -1. -1. -1.
-1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.
-1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.
-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
Run Percent: 0.0
Run Percent: 0.1
Run Percent: 0.2
Run Percent: 0.3
Run Percent: 0.4
Run Percent: 0.5
Run Percent: 0.6
Run Percent: 0.7
Run Percent: 0.8
Run Percent: 0.9
train_accruRate: 1.0
test_accruRate: 1.0
Run time : 0.035977840423583984
plt.figure(figsize=(6,6), dpi=80)
plt.scatter(data1[0], data1[1],alpha=0.5)
plt.scatter(data2[0], data2[1],alpha=0.5)
x = np.linspace(-3,3,50)
w = w.flatten()
y =(-w[0]*x-b)/(w[1]+0.000001)
plt.plot(x, y,color = 'red')
# 移动脊柱
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data',0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))
plt.show();