猴子？狒狒？傻傻分不清楚——制作tfrecord数据集并利用卷积神经网络训练实例

最新推荐文章于 2024-08-13 16:25:59 发布

hznuhise_jeffrey

最新推荐文章于 2024-08-13 16:25:59 发布

阅读量1.7k

点赞数 2

文章标签：深度学习 tensorflow 计算机视觉

本文链接：https://blog.csdn.net/hznuhise_jeffrey/article/details/87903781

版权

去年年底学习了深度学习的相关知识，但是寒假回来之后忘得也差不多了。。。为了巩固下所学知识，近期利用卷积神经网络做了一个小实例。卷积神经网络是一种多层神经网络，擅长处理图像特别是大图像的相关机器学习问题。卷积网络通过一系列方法，成功将数据量庞大的图像识别问题不断降维，最终使其能够被训练。为了测试卷积神经网络的性能，特地选择了猴子和狒狒这两种长得差不多的动物图片进行训练。

【step1：数据准备】

首先为了方便之后制作TFRecord数据集和训练，先在一个文件下建立两个文件夹"train" 和 "test":

然后分别进入这两个文件夹中，创建存放两种动物图片的文件夹"monkey"和"baboon":

现在文件夹中还是空的，没有现成的数据集，于是我通过百度爬虫获取了大约2000张猴子和狒狒的图片，爬虫脚本如下：

# -*- coding: utf-8 -*-
"""根据搜索词下载百度图片"""
import re
import sys
import urllib
import requests

keyWord = '狒狒'
savePath = 'E:/python/2019_02_24/train/baboon/'

def get_onepage_urls(onepageurl):
    """获取单个翻页的所有图片的urls+当前翻页的下一翻页的url"""
    if not onepageurl:
        print('已到最后一页, 结束')
        return [], ''
    try:
        html = requests.get(onepageurl)
        html.encoding = 'utf-8'
        html = html.text
    except Exception as e:
        print(e)
        pic_urls = []
        fanye_url = ''
        return pic_urls, fanye_url
    pic_urls = re.findall('"objURL":"(.*?)",', html, re.S)
    fanye_urls = re.findall(re.compile(r'<a href="(.*)" class="n">下一页</a>'), html, flags=0)
    fanye_url = 'http://image.baidu.com' + fanye_urls[0] if fanye_urls else ''
    return pic_urls, fanye_url


def down_pic(pic_urls):
    """给出图片链接列表, 下载所有图片"""
    for i, pic_url in enumerate(pic_urls):
        try:
            pic = requests.get(pic_url, timeout=15)
            string = savePath + str(i + 1) + '.jpg'
            with open(string, 'wb') as f:
                f.write(pic.content)
                print('成功下载第%s张图片: %s' % (str(i + 1), str(pic_url)))
        except Exception as e:
            print('下载第%s张图片时失败: %s' % (str(i + 1), str(pic_url)))
            print(e)
            continue


if __name__ == '__main__':
    keyword = keyWord  # 关键词, 改为你想输入的词即可, 相当于在百度图片里搜索一样
    url_init_first = 'http://image.baidu.com/search/flip?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1497491098685_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&ctd=1497491098685%5E00_1519X735&word='
    url_init = url_init_first + urllib.parse.quote(keyword, safe='/')
    all_pic_urls = []
    onepage_urls, fanye_url = get_onepage_urls(url_init)
    all_pic_urls.extend(onepage_urls)

    fanye_count = 0  # 累计翻页数
    while True:
        onepage_urls, fanye_url = get_onepage_urls(fanye_url)
        fanye_count += 1
        # print('第页' % str(fanye_count))
        if fanye_url == '' and onepage_urls == []:
            break
        all_pic_urls.extend(onepage_urls)

    down_pic(list(set(all_pic_urls)))

其中keyWord就是你想要搜索的关键词，savePath就是存放路径,运行脚本后，程序会自动从百度上自动下载该类图片到你设置的路径下。下载完毕后，将某些无法显示，或者无关的图片删除。最后为了方便，直接剪切小部分图片放入对应类别的test文件夹中。

【step2：制作TFRecord数据】

制作TFRecord数据的脚本网上有很多，大致都是差不多的，我用的脚本如下：

import os 
import tensorflow as tf 
from PIL import Image  #注意Image,后面会用到
import matplotlib.pyplot as plt 
import numpy as np

cwd='E:/python/2019_02_24/train/'  #图片存放路径
classes={'monkey','baboon'} #人为 设定 2 类
writer= tf.python_io.TFRecordWriter("20190224_train.tfrecords") #要生成的文件路径
 
for index, name in enumerate(classes):
    class_path = cwd + name +'/'
    for img_name in os.listdir(class_path): 
        img_path = class_path + img_name #每一个图片的地址
 
        img = Image.open(img_path)
        img = img.convert("RGB")  # 将图片转成3通道的RGB图片
        img = img.resize((24, 24))
        img_raw = img.tobytes() #将图片转化为二进制格式
        example = tf.train.Example(features=tf.train.Features(feature={
            "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
            'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))
        })) #example对象对label和image数据进行封装
        writer.write(example.SerializeToString())  #序列化为字符串
writer.close()

其中classes中的类别名称一定要和文件夹的名称一样，我这里把每张图片统一缩小为24*24*3的大小。运行之后就可以得到训练集和测试集的TFRecord数据：

【step3 训练】

我的网络结构为：输入-->卷积核为1，步长为1的卷积层-->窗口大小为2，步长为2的最大池化层-->卷积核为1，步长为1的卷积层-->窗口大小为2，步长为2的最大池化层-->卷积核为1，步长为1的卷积层-->窗口大小为6，步长为6的均值池化层-->全连接层-->输出

为了提高模型精度，我特意加入了批量归一化和学习率退化处理。程序如下：

import tensorflow as tf 
import numpy as np
from tensorflow.contrib.layers.python.layers import batch_norm

''' 1.数据集准备'''
# 取出数据集
filename_queue1 = tf.train.string_input_producer(["20190224_train.tfrecords"]) #读入流中
reader1 = tf.TFRecordReader()
_, serialized_example1 = reader1.read(filename_queue1)   #返回文件名和文件
features1 = tf.parse_single_example(serialized_example1,
                                   features={
                                       'label': tf.FixedLenFeature([], tf.int64),
                                       'img_raw' : tf.FixedLenFeature([], tf.string),
                                   })  #取出包含image和label的feature对象
image1 = tf.decode_raw(features1['img_raw'], tf.uint8)
image1 = tf.reshape(image1, [24, 24, 3])
label1 = tf.cast(features1['label'], tf.int32)

# 取出训练集 一定要使用shuffle_batch打乱顺序，否则训练过程中会出现精度0，1之间交替的情况
image_batch, label_batch = tf.train.shuffle_batch([image1, label1],batch_size = 128,capacity=2000,min_after_dequeue=1000)

filename_queue2 = tf.train.string_input_producer(["20190224_test.tfrecords"]) #读入流中
reader2 = tf.TFRecordReader()
_, serialized_example2 = reader2.read(filename_queue2)   #返回文件名和文件
features2 = tf.parse_single_example(serialized_example2,
                                   features={
                                       'label': tf.FixedLenFeature([], tf.int64),
                                       'img_raw' : tf.FixedLenFeature([], tf.string),
                                   })  #取出包含image和label的feature对象
image2 = tf.decode_raw(features2['img_raw'], tf.uint8)
image2 = tf.reshape(image2, [24, 24, 3])
label2 = tf.cast(features2['label'], tf.int32)

# 取出测试集 
images_test, labels_test = tf.train.shuffle_batch([image2, label2],batch_size = 512,capacity=2000,min_after_dequeue=1000)

''' 2.网络搭建 '''
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)
  
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')  
                        
def avg_pool_6x6(x):
  return tf.nn.avg_pool(x, ksize=[1, 6, 6, 1],
                        strides=[1, 6, 6, 1], padding='SAME')
                        
def batch_norm_layer(value,train = None, name = 'batch_norm'): 
  if train is not None:       
      return batch_norm(value, decay = 0.9,updates_collections=None, is_training = True)
  else:
      return batch_norm(value, decay = 0.9,updates_collections=None, is_training = False)

# 定义占位符
x = tf.placeholder(tf.float32, [None, 24, 24, 3]) # 输入为128*128*3
y = tf.placeholder(tf.float32, [None, 2]) # 2类
train = tf.placeholder(tf.float32)

# 定义网络结构
W_conv1 = weight_variable([5, 5, 3, 32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1,24,24,3])

h_conv1 = tf.nn.relu(batch_norm_layer((conv2d(x_image, W_conv1) + b_conv1),train))
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 32])
b_conv2 = bias_variable([32])

h_conv2 = tf.nn.relu(batch_norm_layer((conv2d(h_pool1, W_conv2) + b_conv2),train))
h_pool2 = max_pool_2x2(h_conv2)

W_conv3 = weight_variable([5, 5, 32, 2])
b_conv3 = bias_variable([2])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)

nt_hpool3=avg_pool_6x6(h_conv3)#2
nt_hpool3_flat = tf.reshape(nt_hpool3, [-1, 2])

y_conv = tf.contrib.layers.fully_connected(nt_hpool3_flat,2,activation_fn=tf.nn.softmax)

# 定义交叉熵
cross_entropy = -tf.reduce_sum(y * tf.log(y_conv))
#cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_conv))

#加入学习率退化
global_step = tf.Variable(0, trainable=False)
decaylearning_rate = tf.train.exponential_decay(0.04, global_step,1000, 0.9)

#定义优化器
train_step = tf.train.AdamOptimizer(decaylearning_rate).minimize(cross_entropy,global_step=global_step)
#train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

''' 3.开始训练'''
sess = tf.Session()
sess.run(tf.global_variables_initializer())
tf.train.start_queue_runners(sess=sess)
for i in range(15000):
	image_bth, label_b = sess.run([image_batch, label_batch])

	label_bth = np.eye(2, dtype=float)[label_b] #one hot
	#print(label_bth)

	train_step.run(feed_dict={x:image_bth, y: label_bth, train:1}, session=sess)

	if i % 200 == 0:
		train_accuracy = accuracy.eval(feed_dict={x:image_bth, y: label_bth}, session=sess)
		print( "step %d, training accuracy %g"%(i, train_accuracy))
image_bth, label_b = sess.run([images_test, labels_test])
label_bth = np.eye(2,dtype=float)[label_b]
print ("finished！ test accuracy %g"%accuracy.eval(feed_dict={
     x:image_bth, y: label_bth},session=sess))

这里需要注意的是，在取出一批数据集的时候，最好要用tf.train.shuffle_batch函数进行打乱顺序，如果使用按顺序取出批次的方法，在训练过程中，你的精度会一直显示为0或者1。另外，测试集必须要打乱顺序，否则在最后进行测试的时候你的测试精度不管迭代多少次都会是1。理想情况下我们想得到的模型精度确实是1，但现实情况下，具有很好的泛化能力的模型是不可能达到1的，一开始我以为是过拟合的原因，加入正则化项后，发现仍然没有改善，直到改变了迭代次数观察到测试精度始终为1时才发现了这个问题，这里要mark一下。

最后我训练得到的模型精度为：