Tensorflow复现各版本Resnet测试cifar10
经过这段时间的努力,了解了resnet相关知识,并完成代码实现,同时也遇到了相当多的问题,特此记录。
一.Resnet的优越性
resnet是以卷积神经网络为基础的一种优化。当我们堆叠更多的卷积层时,我们希望模型表现出更优的性能,但是事与愿违,随着网络变深,模型性能非但没有提升,反而出现了退化,resnet便是解决这个问题的。
由于BN层的存在,导致模型退化的原因便不是过拟合和梯度消失,而更可能是训练遇到瓶颈。我们每一轮的训练,模型在持续优化的同时,也会有错误信息的累加,这种错误带来的负影响在网络较简单时较小,或者说相对于模型优化带来的正影响较小。但是随着我们训练轮数的增加,和网络的加深,模型优化的带来的正影响并不能抵消错误信息累加带来的负影响,这时模型性能便会出现退化,因此,resnet要解决的是模型优化停滞的问题。
对于这个问题,resnet的解决方法是做最坏的打算,当我们加深网络,我们要保证模型不会退化,至少要拥有恒等映射的能力。换言之,resnet保证模型优化带来的正影响大于等于错误信息累加带来的负影响。即resnet避免了很多会导致模型退化的训练,它允许我们堆叠更深的网络层数来寻求更优的模型,而这也恰是它的优势所在。
二.代码实现(Resnet18为例)
1.核心部分(残差单元)
'''
Resnet18 与 Resnet32 -----v1
'''
def residual(self, inputs, num_channels, training, use_1x1conv=False, strides=1):
outputs = tf.layers.conv2d(inputs=inputs, filters=num_channels, kernel_size=3, padding='same',
kernel_initializer=tf.contrib.layers.variance_scaling_initializer(),
strides=strides, activation=None, use_bias=False)
outputs = tf.layers.batch_normalization(inputs=outputs, training=training)
outputs = tf.nn.relu(outputs)
outputs = tf.layers.conv2d(inputs=outputs, filters=num_channels, kernel_size=3, padding='same',
kernel_initializer=tf.contrib.layers.variance_scaling_initializer(),
strides=1, activation=None, use_bias=False)
outputs = tf.layers.batch_normalization(inputs=outputs, training=training)
if use_1x1conv:
inputs = tf.layers.conv2d(inputs=inputs, filters=num_channels, kernel_size=1, padding='same',
kernel_initializer=tf.contrib.layers.variance_scaling_initializer(),
strides=strides, activation=None, use_bias=False)
inputs = tf.layers.batch_normalization(inputs=inputs, training=training)
result = tf.add(inputs, outputs)
return tf.nn.relu(result)
2.创建resnet
def block(self, inputs, num_channels, num_residuals, training, first_block=False):
outputs = inputs
for i in range(num_residuals):
if i == 0 and not first_block:
outputs = self.residual(outputs, num_channels, training=training, use_1x1conv=True, strides=2)
else:
outputs = self.residual(outputs, num_channels, training=training)
return outputs
以下为针对cifar10的图片预处理,和训练部分:
3.图片预处理(纯python实现,可自行替换)
采用的是kaggle提供的cifar10数据集
class Datamanage:
def image_manage(self, img_file, flag):
if flag == 'train':
img = Image.open('train/' + img_file)
# 实现随机裁剪
img_size = img.resize((40, 40), Image.ANTIALIAS)
img_arr = np.array(img_size)
a = random.randint(0, 8)
b = random.randint(0, 8)
cropped = img_arr[a:a+32, b:b+32]
# 实现随机翻转
f = random.randint(0, 1)
if f == 1:
cropped = cv2.flip(cropped, 1)
img_result = cp.reshape(cropped, (1, -1))
else:
img = Image.open('train/' + img_file) # 这里的路径需要注意,训练和测试的时候是不一样的
# 测试图片先放缩到40*40,再截取中间32*32部分
img_size = img.resize((40, 40), Image.ANTIALIAS)
img_arr = np.array(img_size)
cropped = img_arr[4:36, 4:36]
img_result = cp.reshape(cropped, (1, -1))
return img_result
def read_and_convert(self, filelist, flag):
if flag == 'train':
data = self.image_manage(filelist[0], 'train')
for i in range(1, len(filelist)):
img = filelist[i]
data =np.concatenate((data, self.image_manage(img, 'train')), axis=0)
else:
data = self.image_manage(filelist[0], 'test')
for i in range(1, len(filelist)):
img = filelist[i]
data =np.concatenate((data, self.image_manage(img, 'test')), axis=0)
return data
def label_manage(self, csv_path, num_classes):
# 将标签转化为one-hot格式
label = self.csv_read(csv_path)
total_y = np.zeros((len(label), num_classes))
for i in range(len(label)):
if label[i]=='airplane': total_y[i][0] = 1
elif label[i]=='automobile': total_y[i][1] = 1
elif label[i]=='bird': total_y[i][2] = 1
elif label[i]=='cat': total_y[i][3] = 1
elif label[i]=='deer': total_y[i][4] = 1
elif label[i]=='dog': total_y[i][5] = 1
elif label[i]=='frog': total_y[i][6] = 1
elif label[i]=='horse': total_y[i][7] = 1
elif label[i]=='ship': total_y[i][8] = 1
elif label[i]=='truck': total_y[i][9] = 1
return total_y
def csv_read(self, data_path):
label = []
with open(data_path, "r") as f:
reader = csv.reader(f)
for row in reader:
label.append(row[1])
new_label = np.reshape(label[1:], (-1, 1))
return new_label
def csv_write(self, data):
f = open('result.csv', 'w', encoding='utf-8', newline='')
csv_writer = csv.writer(f)
csv_writer.writerow(["id", "label"])
for i in range(len(data)):
csv_writer.writerow([str(i+1), data[i]])
4.训练
def train():
'''
参数设置
'''
num_classes = 10 # 输出大小
input_size = 32*32*3 # 输入大小
training_iterations = 30000 # 训练轮数
weight_decay = 2e-4 # 权重衰减系数
ver = 2 # 版本号 1 or 2
manage = Datamanage()
resnet = Resnet()
'''
数据读取
'''
path = 'train/'
data = os.listdir(path)
data.sort(key=lambda x:int(x.split('.')[0]))
label = manage.label_manage('train.csv', num_classes)
x_train = data[:49000]; x_test = data[49000:]
y_train = label[:49000]; y_test = label[49000:]
y_test = [np.argmax(x) for x in y_test]
'''
网络搭建
'''
X = tf.placeholder(tf.float32, shape = [None, input_size], name='x')
Y = tf.placeholder(tf.float32, shape = [None, num_classes], name='y')
training = tf.placeholder(tf.bool, name="training")
input_images = tf.reshape(X, [-1, 32, 32, 3])
input_images = tf.image.per_image_standardization(input_images) # 图片标准化处理
print(input_images.shape)
inputs = tf.layers.conv2d(inputs=input_images, filters=64, kernel_size=3, strides=1, padding='same',
activation=None, use_bias=False)
if ver == 1:
inputs = tf.nn.relu(tf.layers.batch_normalization(inputs, training=training))
max_pool = tf.layers.max_pooling2d(inputs, pool_size=3, strides=2, padding='same')
'''
resnet18 [2, 2, 2, 2]
resnet34 [3, 4, 6, 3]
resnet50 [3, 4, 6, 3]
resnet101 [3, 4, 23, 3]
resnet152 [3, 8, 36, 3]
'''
num_residuals = [2, 2, 2, 2]
blk = resnet.block(max_pool, 64, num_residuals[0], training=training, first_block=True)
blk = resnet.block(blk, 128, num_residuals[1], training=training)
blk = resnet.block(blk, 256, num_residuals[2], training=training)
blk = resnet.block(blk, 512, num_residuals[3], training=training)
if ver == 2:
inputs = tf.nn.relu(tf.layers.batch_normalization(inputs, training=training))
pool = tf.layers.average_pooling2d(blk, pool_size=2, strides=2, padding='same')
final_opt = tf.layers.dense(inputs=pool, units=10)
tf.add_to_collection('pred_network', final_opt)
# 学习率衰减
global_step = tf.Variable(0, trainable=False)
'''
分段学习率
'''
boundaries = [10000, 15000, 20000, 25000]
values = [0.1, 0.05, 0.01, 0.005, 0.001]
learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
'''
衰减学习率
'''
# initial_learning_rate = 0.002 # 初始学习率
# learning_rate = tf.train.exponential_decay(learning_rate=initial_learning_rate, global_step=global_step, decay_steps=200, decay_rate=0.95)
# 对输出层计算交叉熵损失
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=final_opt))
l2_loss = weight_decay * tf.add_n([tf.nn.l2_loss(tf.cast(v, tf.float32)) for v in tf.trainable_variables()])
tf.summary.scalar('l2_loss', l2_loss)
loss = loss + l2_loss
# 定义优化器
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
opt = optimizer.minimize(loss, global_step=global_step)
# 初始化
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
'''
训练
'''
for i in range(training_iterations):
start_step = i*128 % 49000
stop_step = start_step + 128
batch_x, batch_y = x_train[start_step:stop_step], y_train[start_step:stop_step]
batch_x = manage.read_and_convert(batch_x, 'train')
training_loss = sess.run([opt, loss, learning_rate], feed_dict={X:batch_x, Y:batch_y, training:True})
if i%10 == 0:
test_data = manage.read_and_convert(x_test[:1000], 'test')
result = sess.run(final_opt, feed_dict={X:test_data[:1000], training:False})
result = [np.argmax(x) for x in result]
print("step : %d, training loss = %g, accuracy_score = %g, learning_rate = %g" % (i, training_loss[1], metrics.accuracy_score(y_test[:1000], result), training_loss[2]))
if(metrics.accuracy_score(y_test[:1000], result) > 0.92):
break
saver.save(sess, './data/resnet.ckpt') # 模型保存
5.测试复用
def test():
path = "test/"
manage = Datamanage()
filelist = os.listdir(path)
filelist.sort(key=lambda x:int(x.split('.')[0]))
saver = tf.train.import_meta_graph("./data/resnet.ckpt.meta")
results = []
with tf.Session() as sess:
saver.restore(sess, "./data/resnet.ckpt")
graph = tf.get_default_graph()
x = graph.get_operation_by_name("x").outputs[0]
y = tf.get_collection("pred_network")[0]
training = graph.get_operation_by_name("training").outputs[0]
for i in range(len(filelist) // 100):
s = i*100; e = (i+1)*100
data = manage.read_and_convert(filelist[s:e], 'test')
result = sess.run(y, feed_dict={x:data, training:False})
result = [np.argmax(x) for x in result]
for re in result:
if re==0: results.append('airplane')
elif re==1: results.append('automobile')
elif re==2: results.append('bird')
elif re==3: results.append('cat')
elif re==4: results.append('deer')
elif re==5: results.append('dog')
elif re==6: results.append('frog')
elif re==7: results.append('horse')
elif re==8: results.append('ship')
elif re==9: results.append('truck')
print("num=====", i*100)
# print(results)
manage.csv_write(results)
print('done!!')
完整代码:https://github.com/wulewule/neural/blob/master/resnet.py
结果展示(仅代表最低水平,可自行优化训练):
三.问题汇总
1.loss=2.3,accuracy_score=0.1
正确率基本等于瞎猜,loss不下降,说明模型未收敛。这个问题困扰了我好久,尝试了很多网上的解决办法,对数据标准化归一化,调整学习率,设置参数初始化方式等都未奏效。后来尝试用模型来跑mnist数据集,发现正常收敛,用普通机器学习算法来测试cifar10,正确率仍为0.1,故确定是数据处理的问题。经过排查,发现问题出现在如下部分:
path = 'train/'
data = os.listdir(path)
我用这个函数来读取训练的图片,但是忽略了它返回的列表是乱序的。因此导致了训练时,我的训练数据和标签是不匹配的,故更改如下:
path = 'train/'
data = os.listdir(path)
data.sort(key=lambda x:int(x.split('.')[0]))
按照图片的名字进行排序,问题解决。
2.数据多,预处理缓慢,训练启动耗时
首先对50000张图片同时进行预处理,耗时很长,甚至当预处理的函数耗时也较长时,出现程序停滞的问题。经过尝试,发现仅对每一batch的数据进行预处理,效果显著,代码如下:
'''
数据读取
'''
path = 'train/'
data = os.listdir(path)
data.sort(key=lambda x:int(x.split('.')[0]))
label = manage.label_manage('train.csv', num_classes)
x_train = data[:49000]; x_test = data[49000:]
y_train = label[:49000]; y_test = label[49000:]
y_test = [np.argmax(x) for x in y_test]
'''
训练
'''
for i in range(training_iterations):
start_step = i*128 % 49000
stop_step = start_step + 128
batch_x, batch_y = x_train[start_step:stop_step], y_train[start_step:stop_step]
batch_x = manage.read_and_convert(batch_x, 'train')
training_loss = sess.run([opt, loss, learning_rate], feed_dict={X:batch_x, Y:batch_y, training:True})
3.训练集正确率91%,测试集仅有87%
才疏学浅,这个问题也纠结了很久。确定是过拟合,所以在各种地方添加dropout,改变学习率,增添权重衰减等等,然而并没有效果。后查阅发现,resnet是不兼容dropout的,去掉dropout后效果仍很差,面对同样的网络结构,自己跑不出别人的效果,着实很难受。后经过查阅多个博客,终于找到解决办法,仅是一行代码的差别:
原:
if use_1x1conv:
inputs = tf.layers.conv2d(inputs=inputs, filters=num_channels, kernel_size=1, padding='same',
kernel_initializer=tf.contrib.layers.variance_scaling_initializer(),
strides=strides, activation=None, use_bias=False)
改:
if use_1x1conv:
inputs = tf.layers.conv2d(inputs=inputs, filters=num_channels, kernel_size=1, padding='same',
kernel_initializer=tf.contrib.layers.variance_scaling_initializer(),
strides=strides, activation=None, use_bias=False)
inputs = tf.layers.batch_normalization(inputs=inputs, training=training)
虽然我参考复现的代码并没有加这一层,但是有效果就是极好的~
四.总结
虽然这次耗费了很长时间,但是收获也是巨大的,也学到了很多解决问题的方法,当问题迟迟不能解决,看着一样的东西自己却跑不出效果,心里是很急躁的。但很顺利达成的东西带给自己的进步不大的,有些错误迟早都要经历的,加油~
参考了很多大神的文章,在此列出:
https://www.zhihu.com/question/64494691?sort=created
https://blog.csdn.net/abc13526222160/article/details/90057121
https://blog.csdn.net/sunqiande88/article/details/80100891?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-1&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-1#
https://blog.csdn.net/gzroy/article/details/82386540?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.nonecase