文章目录
ps:该博客是对up主在 b站视频的文字和代码记录
一、不涉及卷积层
准备工作 导入包
import keras
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense,Activation,Dropout,Conv2D,MaxPooling2D,Flatten
from keras.optimizers import SGD,Adam
数据处理,具体每一步的作用,也有详细的注释
def load_data():
(x_train,y_train),(x_test,y_test)=mnist.load_data()
number=10000
x_train=x_train[0:number]
y_train=y_train[0:number]
x_train=x_train.reshape(x_train.shape[0],28*28)
x_test=x_test.reshape(x_test.shape[0],28*28)
#标签值转换成十维向量
y_train=np_utils.to_categorical(y_train,10)
y_test=np_utils.to_categorical(y_test,10)
#归一化
x_train=x_train.astype('float32')
x_test=x_test.astype('float32')
x_train=x_train/255
x_test=x_test/255
return (x_train,y_train),(x_test,y_test)
(x_train,y_train),(x_test,y_test)=load_data()
1 sigmoid激发函数+mse+SGD
model = Sequential()
model.add(Dense(input_dim=28*28,units=666,activation='sigmoid'))
for i in range(10):
model.add(Dense(units=666,activation='sigmoid'))
model.add(Dense(units=10,activation='softmax'))
model.summary()
model.compile(loss='mse',optimizer=SGD(lr=0.1),metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
这里就是暴力12层的全连接层,其实效果非常差,和3层基本一样,具体效果可见我的视频,准确率只有可怜的10%,和瞎蒙一样,可以说,完全没有学到东西。
这里介绍一下sigmoid
θ
(
x
)
=
1
1
+
e
−
x
θ(x)=\frac{1}{1+e^{-x}}
θ(x)=1+e−x1
对,就是上面这玩意,因为映射值强制到了0-1,所以前面的梯度在后面就全都变得很小,然后就训不出来了,大概就是这么个意思吧。
然后是mse,就是均方差
J
(
θ
)
=
1
2
m
∑
i
=
0
m
(
y
i
−
h
θ
(
x
i
)
)
2
J(\theta) = \frac{1}{2m}\sum_{i = 0} ^m(y^i - h_\theta (x^i))^2
J(θ)=2m1i=0∑m(yi−hθ(xi))2
就是上面这货,用来求差值的
至于SGD呢,就是随机梯度下降算法,对上面的mse进行求导,然后往梯度方向逐渐逼近局部最优值。
2 relu激发函数+mse+SGD
在此,分析上面的问题是后面的梯度因为sigmoid函数的特性,而不能很好的保留前面的梯度值,这里我们修改成relu函数
model = Sequential()
model.add(Dense(input_dim=28*28,units=666,activation='relu'))
model.add(Dense(units=666,activation='relu'))
model.add(Dense(units=10,activation='softmax'))
model.summary()
model.compile(loss='mse',optimizer=SGD(lr=0.1),metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
这里我只是修改了一下激发函数,然后在3层全连接层的训练精度就达到了惊人的90%
relu函数是什么呢?为什么会有这么出色效果?
a
=
m
a
x
(
0
,
z
)
a=max(0,z)
a=max(0,z)
对,就是这个,比上面的simoid函数简单不知多少,然而效果却比前面的不知道好了多少。
那么,我们是否还可以在不使用卷积层的前提下,继续优化呢?
3 relu激发函数+mse+SGD 多层
emmm……我们暴力一发来个12层全连接层试试
model = Sequential()
model.add(Dense(input_dim=28*28,units=666,activation='relu'))
for i in range(10):
model.add(Dense(units=666,activation='relu'))
model.add(Dense(units=10,activation='softmax'))
model.summary()
model.compile(loss='mse',optimizer=SGD(lr=0.1),metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
然后我的结果就坏掉了,只有26%,很显然,网络层并不是越多越好,多了反而结果会坏掉。
那么接下来就不行了吗?
4 relu激发函数+mse+Adam 更快收敛
这里介绍一个另一个优化算法,Adam,也就是自适应优化器,和SGD不同的是,这货会自己在程序运行中,自己动态的调节梯度下降的快慢,从而能加快网络的收敛速度
model = Sequential()
model.add(Dense(input_dim=28*28,units=666,activation='relu'))
model.add(Dense(units=666,activation='relu'))
model.add(Dense(units=10,activation='softmax'))
model.summary()
model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
同样的三层全连接层,然后最后达到了95%的进度,结果可谓喜人。
5 relu激发函数+categorical_crossentropy+SGD
回到上面一个话题,我们有没有办法让上面一个被训练坏掉的12层全连接层的结果变得更好呢?这里我们将mse换成categorical_crossentropy,也就是交叉熵来试试
model = Sequential()
model.add(Dense(input_dim=28*28,units=666,activation='relu'))
for i in range(10):
model.add(Dense(units=666,activation='relu'))
model.add(Dense(units=10,activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy',optimizer=SGD(lr=0.1),metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
在这样情况下,我们同样发现最后精度达到了95.1%,模型并没有坏掉,那么交叉熵是个什么东西呢?(emmm……我完全不知道原因,只知道这个效果更好……zzzz,咸鱼到了极致)
L
=
−
[
y
l
o
g
y
^
+
(
1
−
y
)
l
o
g
(
1
−
y
^
)
]
L=−[ylog \hat{y}+(1−y)log (1−\hat{y})]
L=−[ylogy^+(1−y)log(1−y^)]
这是这货,看起来很复杂,其实就是正常的梯度下降的求导结果加了求对数操作吧……大概。
6 relu激发函数+categorical_crossentropy+Adam
那么我们如果在换上Adam呢?
model = Sequential()
model.add(Dense(input_dim=28*28,units=666,activation='relu'))
for i in range(10):
model.add(Dense(units=666,activation='relu'))
model.add(Dense(units=10,activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
最后验证集精度在95.7%吧 好了一丢丢,不过训练集倒是到了99.5,轻微过拟合吧,我是连不上去了
二、Dropout层对错误数据的拟合效果优化
如果我们一开始训练的数据和最后验证的结果标签值不一致呢?
def load_error_data():
(x_train,y_train),(x_test,y_test)=mnist.load_data()
number=10000
x_train=x_train[0:number]
y_train=y_train[0:number]
x_train=x_train.reshape(x_train.shape[0],28*28)
x_test=x_test.reshape(x_test.shape[0],28*28)
#标签值转换成十维向量
y_train=np_utils.to_categorical(y_train,10)
y_test=np_utils.to_categorical(y_test,10)
#归一化
x_train=x_train.astype('float32')
x_test=x_test.astype('float32')
x_train=x_train/255
x_test=x_test/255
# 返回的验证集的训练结果和标签值不对应
x_test=np.random.normal(x_test)
return (x_train,y_train),(x_test,y_test)
(x_train,y_train),(x_test,y_test)=load_error_data()
model = Sequential()
model.add(Dense(input_dim=28*28,units=666,activation='relu'))
model.add(Dense(units=666,activation='relu'))
model.add(Dense(units=666,activation='relu'))
model.add(Dense(units=10,activation='softmax'))
model.summary()
model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
结果显然是不好的,
Accurancy of Training Set: 0.9895
Accurancy of Training Set: 0.5082
稍作修改 加入Dropout层
model = Sequential()
model.add(Dense(input_dim=28*28,units=666,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=666,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=666,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=10,activation='softmax'))
# model.add(Dropout(0.5))
model.summary()
model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
最后的结果是:
Accurancy of Training Set: 0.9873
Accurancy of Training Set: 0.6099
效果提升了10%,个人感觉还算不错
三、卷积网络的加入
说了这么多,终于到了我们这篇博客的主角–卷积层。
首先是数据处理
def load_con_data():
(x_train,y_train),(x_test,y_test)=mnist.load_data()
number=10000
x_train=x_train[0:number]
y_train=y_train[0:number]
#卷积网络不能是之前的 一维线性结构了
x_train=x_train.reshape(x_train.shape[0],28,28,1)
x_test=x_test.reshape(x_test.shape[0],28,28,1)
#标签值转换成十维向量
y_train=np_utils.to_categorical(y_train,10)
y_test=np_utils.to_categorical(y_test,10)
#归一化
x_train=x_train.astype('float32')
x_test=x_test.astype('float32')
x_train=x_train/255
x_test=x_test/255
# x_test=np.random.normal(x_test)
return (x_train,y_train),(x_test,y_test)
(x_train,y_train),(x_test,y_test)=load_con_data()
然后我们开始搭建网络进行训练
model = Sequential() # 1*28*28
model.add(Conv2D(32,(3,3),input_shape=(28,28,1))) #32*26*26
# model.add(Dropout(0.5))
model.add(Conv2D(64,(3,3))) #64*24*24
# model.add(Dropout(0.5))
# model.add(Conv2D(128,(3,3))) #128*22*22
# model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(units=100,activation='relu'))
model.add(Dense(units=10,activation='softmax'))
# model.add(Dropout(0.5))
model.summary()
model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
Accurancy of Training Set: 0.9854
Accurancy of Training Set: 0.9308
最后的精度没有我们之前的95%高嘛,别急,我们在加个池化层
model = Sequential() # 1*28*28
model.add(Conv2D(32,(3,3),input_shape=(28,28,1))) #32*26*26
model.add(MaxPooling2D(2,2))
# model.add(Dropout(0.5))
model.add(Conv2D(64,(3,3))) #64*24*24
model.add(MaxPooling2D(2,2))
# model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(units=100,activation='relu'))
model.add(Dense(units=10,activation='softmax'))
# model.add(Dropout(0.5))
model.summary()
model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
print('Accurancy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
print('Accurancy of Training Set:',score[1])
Accurancy of Training Set: 0.997
Accurancy of Training Set: 0.9777
已经97%,下面我们再来个Dropout层
model = Sequential()
model.add(Conv2D(32,(3,3),input_shape=(28,28,1))) # 9个参数 1*28*28 =》 32*26*26
model.add(MaxPooling2D(2,2)) #=》32*13*13
model.add(Dropout(0.25))
model.add(Conv2D(64,(3,3))) # 225 个参数 =》64*11*11
model.add(MaxPooling2D(2,2)) # =》64*5*5
model.add(Dropout(0.25))
model.add(Flatten()) # => 64*5*5
model.add(Dense(units=100,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=10,activation='softmax'))
# model.compile(loss='mse',optimizer=SGD(lr=0.1),metrics=['accuracy'])
model.summary()
model.compile(loss='categorical_crossentropy',optimizer=SGD(lr=0.1),metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=20)
score=model.evaluate(x_train,y_train)
# print('Total loss on Testing Set:',score[0])
print('Accuracy of Training Set:',score[1])
score=model.evaluate(x_test,y_test)
# print('Total loss on Testing Set:',score[0])
print('Accuracy of Testing Set:',score[1])
Accuracy of Training Set: 0.9947833333333334
Accuracy of Testing Set: 0.9901
至此,我也已经达到了本次我的极限,也许还有其他更好的本法,我们下次再说吧。
那么什么是卷积层呢?就是通过一个卷积核,对整个大矩阵做矩阵乘法。
至于池化层,就是选择矩阵内的最大最小值。最大池化,就是选最大值。
更详细的的介绍,以后有时间补上,这篇博客,就介绍到此。