tensorflow 训练和测试自己的数据集

最新推荐文章于 2024-06-02 16:08:54 发布

zhou894509

最新推荐文章于 2024-06-02 16:08:54 发布

阅读量9.9k

点赞数 2

分类专栏：深度学习

本文链接：https://blog.csdn.net/zhou894509/article/details/90897721

版权

深度学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

一、选择数据

数据集CelebA_CelebFaces_Attributes_Dataset：这个数据集包含20多万张图片，主要用于人脸属性，性别、年龄、眼镜、胡子等等。

我这里主要是训练人脸的性别，先将图片整理分类并归一化，生成TFRecord格式的数据（图片格式（224,224,3）），便于tensorflow训练。

整理的数据集在网盘上：

https://pan.baidu.com/s/1ptteUCu02TzHD1e-JXl6Ig ，提取码：hmpt

里面有两个，一个是正常数据，另一个比较小的数据，方便快速生成和训练，毕竟20万张图片，我这破电脑生成至少半天o(╥﹏╥)o。

附上生成和提取代码。

#你的数据集路径
src_pic_dir = r'H:/DATA/Gender/Gender_tiny'
orig_picture_test = src_pic_dir+'/test/test.txt'
orig_picture_train = src_pic_dir+'/train/train.txt'
# 生成图片的存储位置
record_test = src_pic_dir+'tf_test_224.tfrecord'
record_train = src_pic_dir+'tf_train_224.tfrecord'
# 需要的识别类型
classes = {'0', '1'}
#图片缩放大小
IMG_SIZE = 224
# 制作TFRecords数据
def create_record(record_path,orig_pic):
    writer = tf.python_io.TFRecordWriter(record_path)
    file_ = open(orig_pic,'r')
    file_list = file_.readlines()
    #将数据集打乱
    random.shuffle(file_list)
    for i in range(len(file_list)):
        name = file_list[i]
        name = name.strip('\n')
        spt = name.split(' ')
        img_path = spt[0]
        index = int(spt[-1])
        #print(name+'  ',str(index))
        img = Image.open(img_path)
        img = img.resize((IMG_SIZE, IMG_SIZE))  # 设置需要转换的图片大小
        ###图片灰度化######################################################################
        # img=img.convert("L")
        ##############################################################################################
        img_raw = img.tobytes()  # 将图片转化为原生bytes
        example = tf.train.Example(
            features=tf.train.Features(feature={
                "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
                'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))
            }))
        writer.write(example.SerializeToString())
    writer.close()
#提取TFRecord数据
def read_and_decode(filename,image_size,is_batch=True,batch_size=3):
    # 创建文件队列,不限读取的数量
    filename_queue = tf.train.string_input_producer([filename])
    # create a reader from file queue
    reader = tf.TFRecordReader()
    # reader从文件队列中读入一个序列化的样本
    _, serialized_example = reader.read(filename_queue)
    # get feature from serialized example
    # 解析符号化的样本
    features = tf.parse_single_example(
        serialized_example,
        features={
            'label': tf.FixedLenFeature([], tf.int64),
            'img_raw': tf.FixedLenFeature([], tf.string)
        })
    print(features['img_raw'])
    img = tf.decode_raw(features['img_raw'],tf.int8)
    print(img)
    img = tf.reshape(img,[image_size,image_size,3])
    img = tf.cast(img,tf.float32)*(1./255)- 0.5
    #img = tf.cast(img,tf.float32)
    label = tf.cast(features['label'],tf.int32)
    #input_queue = tf.train.slice_input_producer([img,label],num_epochs=num_epochs,shuffle=False)
    #print('input_queue:',input_queue)
    print('img:',img,',label:',label)
    min_after_dequeue = 300
    capacity = min_after_dequeue + 3 * batch_size
    #capacity = 50000
    if is_batch:
        print('capacity:',str(capacity),' ,min_after_dequeue:',str(min_after_dequeue))
        img, label = tf.train.shuffle_batch([img,label],#input_queue,
                                              batch_size=batch_size,
                                              num_threads=1,
                                              capacity=capacity,
                                              min_after_dequeue=min_after_dequeue)
    else:
        img,label = tf.train.batch([img,label],batch_size=batch_size,capacity=capacity)
    return img, label
if __name__ == '__main__':
    create_record(record_train,orig_picture_train)
    create_record(record_test,orig_picture_test)

TFRecord的制作和提取网上有很多介绍，这里就不多介绍。

但是这里要重点说明一下，在制作数据的时候，要先打乱数据，我这里用的是

random.shuffle(file_list)

在训练的时候困扰了很久，后来看打印的时候发现每次提取batch_size的数据都是同一类，后来在看

tf.train.shuffle_batch
介绍时，大致了解到，虽然这个函数会打乱，但我怀疑它是在capacity队列中进行打乱，而不是整个数据集打乱，很有可能提取到队列中是顺序放入的，这样才能解释训练中每次提取到的都是同一类的问题。

二、选择网络

选择经典网络Alexnet：https://cloud.tencent.com/developer/news/230380

代码：

def alexnet(datas,n_output,keep_prob,training,Is_train):
    with tf.name_scope('conv1') as scope:
        kernel = tf.Variable(tf.random_normal([11,11,3,64],dtype=tf.float32,stddev=0.01),name='weight1')
        conv = tf.nn.conv2d(datas,kernel,[1,4,4,1],padding='SAME')
        biases = tf.Variable(tf.constant(0.0,shape=[64],dtype=tf.float32),trainable=True,name='biases1')
        conv1 = tf.nn.bias_add(conv,biases)
        conv1 = tf.layers.batch_normalization(conv1,training=training)
        conv1 = tf.nn.relu(conv1,name=scope)
        print_activations(conv1)
        tf.summary.histogram('weight1', kernel)
        tf.summary.histogram('biases1', biases)
        #tf.summary.histogram(scope , conv1)
    conv1 = tf.nn.lrn(conv1,bias=2.0,alpha=2e-04,beta=0.75,name='lrn1')
    pool1 = tf.nn.max_pool(conv1,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool1')
    print_activations(pool1)
    tf.summary.histogram('pool1' , pool1)
    with tf.name_scope('conv2') as scope:
        kernel = tf.Variable(tf.random_normal([5,5,64,128],dtype=tf.float32,stddev=0.01),name='weight2')
        conv = tf.nn.conv2d(pool1,kernel,[1,1,1,1],padding='SAME')
        biases = tf.Variable(tf.constant(0.0,shape=[128],dtype=tf.float32),trainable=True,name='biases2')
        conv2 = tf.nn.bias_add(conv,biases)
        conv2 = tf.layers.batch_normalization(conv2,training=training)
        conv2 = tf.nn.relu(conv2,name=scope)
        print_activations(conv2)
        tf.summary.histogram('weight2' , kernel)
        tf.summary.histogram('biases2' , biases)
        #tf.summary.histogram(scope , conv2)
    #lrn2 = tf.nn.lrn(conv2,bias=2.0,alpha=2e-05,beta=0.75,name='lrn2')
    pool2 = tf.nn.max_pool(conv2,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool2')
    print_activations(pool2)
    tf.summary.histogram('pool2' , pool2)
    with tf.name_scope('conv3') as scope:
        kernel = tf.Variable(tf.random_normal([3,3,128,256],dtype=tf.float32,stddev=0.01),name='weight3')
        conv = tf.nn.conv2d(pool2,kernel,[1,1,1,1],padding='SAME')
        biases = tf.Variable(tf.constant(0.0,shape=[256],dtype=tf.float32),trainable=True,name='biases3')
        conv3 = tf.nn.bias_add(conv,biases)
        #conv3 = tf.layers.batch_normalization(conv3,training=training)
        conv3 = tf.nn.relu(conv3,name=scope)
        tf.summary.histogram('weight3' + '/activations', kernel)
        tf.summary.histogram('biases3' + '/activations', biases)
        tf.summary.histogram(scope + '/activations', conv3)
        print_activations(conv3)

    with tf.name_scope('conv4') as scope:
        kernel = tf.Variable(tf.random_normal([3,3,256,384],dtype=tf.float32,stddev=0.01),name='weight4')
        conv = tf.nn.conv2d(conv3,kernel,[1,1,1,1],padding='SAME')
        biases = tf.Variable(tf.constant(0.0,shape=[384],dtype=tf.float32),trainable=True,name='biases4')
        conv4 = tf.nn.bias_add(conv,biases)
        #conv4 = tf.layers.batch_normalization(conv4,training=training)
        conv4 = tf.nn.relu(conv4,name=scope)
        print_activations(conv4)
        tf.summary.histogram('weight4' + '/activations', kernel)
        tf.summary.histogram('biases4' + '/activations', biases)
        tf.summary.histogram(scope + '/activations', conv4)
    with tf.name_scope('conv5') as scope:
        kernel = tf.Variable(tf.random_normal([3,3,384,256],dtype=tf.float32,stddev=0.01),name='weight5')
        conv = tf.nn.conv2d(conv4,kernel,[1,1,1,1],padding='SAME')
        biases = tf.Variable(tf.constant(0.0,shape=[256],dtype=tf.float32),trainable=True,name='biases5')
        conv5 = tf.nn.bias_add(conv,biases)
        #conv5 = tf.layers.batch_normalization(conv5,training=training)
        conv5 = tf.nn.relu(conv5,name=scope)
        print_activations(conv5)
        tf.summary.histogram('weight5' , kernel)
        tf.summary.histogram('biases5' , biases)
        #tf.summary.histogram(scope , conv5)
    pool5 = tf.nn.max_pool(conv5,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool5')
    print_activations(pool5)
    tf.summary.histogram('pool5' , pool5)
    with tf.name_scope('fc6') as scope:
        kernel = tf.Variable(tf.random_normal([7*7*256,4096],dtype=tf.float32,stddev=0.01),name='weight6')
        flat = tf.reshape(pool5,[-1,7*7*256])
        biases = tf.Variable(tf.constant(0.0,shape=[4096],dtype=tf.float32),trainable=True,name='biases6')
        fc6 = tf.add(tf.matmul(flat,kernel) , biases)
        #fc6 = tf.layers.batch_normalization(fc6,training=training)
        fc6 = tf.nn.relu(fc6,name='relu')
        if Is_train == 1:
            fc6 = tf.nn.dropout(fc6,keep_prob,name=scope)
        print_activations(fc6)
        tf.summary.histogram('weight6' , kernel)
        tf.summary.histogram('biases6' , biases)
        tf.summary.histogram(scope , fc6)
    with tf.name_scope('fc7') as scope:
        kernel = tf.Variable(tf.random_normal([4096,4096],dtype=tf.float32,stddev=0.01),name='weight7')
        biases = tf.Variable(tf.constant(0.0,shape=[4096],dtype=tf.float32),trainable=True,name='biases7')
        fc7 = tf.add(tf.matmul(fc6,kernel) , biases)
        #fc7 = tf.layers.batch_normalization(fc7,training=training)
        fc7 = tf.nn.relu(fc7,name='relu')
        if Is_train == 1:
            fc7 = tf.nn.dropout(fc7,keep_prob,name=scope)
        print_activations(fc7)
        tf.summary.histogram('weight7' , kernel)
        tf.summary.histogram('biases7' , biases)
        tf.summary.histogram(scope , fc7)
    with tf.name_scope('fc8') as scope:
        kernel = tf.Variable(tf.random_normal([4096,n_output],dtype=tf.float32,stddev=0.01),name='weight8')
        biases = tf.Variable(tf.constant(0.0,shape=[n_output],dtype=tf.float32),trainable=True,name='biases8')
        fc8 = tf.nn.bias_add(tf.matmul(fc7,kernel),biases)
        print_activations(fc8)
        tf.summary.histogram('weight8' , kernel)
        tf.summary.histogram('biases8' , biases)
        tf.summary.histogram(scope , fc8)
    return fc8

三、训练

训练代码：

def run_alexnet():
    epochs = 10000   #训练epoch
    image_size = 224   #图片大小
    batch_train = 32    #训练每次提取数量
    batch_test = 32    #测试每次提取数据
    total_batch = 200    #每epoch训练次数
    n_output = 2        #类别数量     
    dropout_rate = 0.85   #保留参数比例
    train_record = r'H:/DATA/Gender/Gender_tiny/tf_train_224.tfrecord'
    test_record = r'H:/DATA/Gender/Gender_tiny/tf_test_224.tfrecord'
    # 训练时的日志logs文件，没有这个目录要先建一个
    train_dir = './tflearn_logs/test_Gender'
    #模型保存路径
    save_dir = './save_model/test_Gender'
    #初始化变量
    X = tf.placeholder(tf.float32,shape=[batch_train,image_size,image_size,3])
    Y = tf.placeholder(tf.int32,shape=[batch_train])
    keep_prob = tf.placeholder(tf.float32)
    Is_training = tf.placeholder(tf.int32)
    training = tf.placeholder_with_default(False, shape=(), name='training')
    #网络
    pred = net.alexnet(X,n_output,keep_prob,training,Is_train=Is_training)
    print('y_res:',Y,',pred:',pred)

    top_k_op = tf.nn.in_top_k(pred, Y, 1)
    loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=Y,logits=pred),name='loss')
    tf.summary.scalar('loss',loss)
    #设置指数学习率
    global_step = tf.Variable(0,trainable = False)
    learning_rate = tf.train.exponential_decay(0.01, global_step, 500,   0.99,  staircase=True)
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
    #添加训练和测试数据
    train_x,train_y = make_record.read_and_decode(train_record,image_size,batch_size=batch_train)
    test_x,test_y = make_record.read_and_decode(test_record,image_size,batch_size=batch_test)
    # 汇总操作,写入tensorboard
    summary_op = tf.summary.merge_all()
    extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    init = tf.global_variables_initializer()
    #保存模型，最多保存max_to_keep个模型
    saver = tf.train.Saver(max_to_keep=3)
    #是否加载模型再次训练
    Retrain = True
    # 占用 GPU 的 20% 资源，主要在gpu资源紧张的时候用
    #config = tf.ConfigProto()
    #config.gpu_options.per_process_gpu_memory_fraction = 0.2
    #sess = tf.InteractiveSession(config=config)

    with tf.Session() as sess:
        sess.run(init)
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess,coord)
        summary_writer = tf.summary.FileWriter(train_dir, sess.graph)

        ckpt = tf.train.get_checkpoint_state(save_dir)
        if ckpt and ckpt.model_checkpoint_path and Retrain:
            ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
            saver.restore(sess, os.path.join(save_dir, ckpt_name))
        try:
            for epoch in range(epochs):
                print('epoch:',str(epoch))
                for i in range(total_batch):#tqdm(range(total_batch)):
                    feed_x,feed_y = sess.run([train_x,train_y])
                    step_val,train_loss,_,_ = sess.run([global_step,loss,train_step,extra_update_ops],
                                  feed_dict={training: True,Is_training:1,X:feed_x,Y:feed_y,keep_prob:dropout_rate})
                    if step_val % 20 == 0:#每隔20次，测试一下数据
                        feed_test_x,feed_test_y = sess.run([test_x,test_y])
                        train_top_k_op = sess.run(top_k_op,
                                feed_dict={training:False,Is_training: 0,X:feed_test_x,Y:feed_test_y,keep_prob:dropout_rate})
                        learning_rate_val = sess.run(learning_rate)
                        predict =  np.sum(train_top_k_op)
                        train_accuracy = predict/batch_test
                        print('step:',str(step_val),' ,learning_rate:',str(learning_rate_val),
                              ' ,loss:',str(train_loss),' , predictNum: ',str(predict),' ,acc:',str(train_accuracy))
                    if step_val % 100 == 0:
                        # 运行汇总操作， 写入汇总，便于在网页上查看训练过程中参数的变化
                        summary_str = sess.run(summary_op,feed_dict={training:True,Is_training: 1,X:feed_x,Y:feed_y,keep_prob:dropout_rate})
                        summary_writer.add_summary(summary_str, step_val)
                if  ((epoch+1)*total_batch) % 1000 == 0:
                        saver.save(sess,save_dir + '/model.ckpt',global_step=step_val)

        except tf.errors.OutOfRangeError:
            print('complete')
        finally:
            coord.request_stop()
        coord.join(threads)

四、测试数据

def evaluate():
    batch_size = 64
    n_output = 2
    image_size = 224
    save_dir = './save_model/test_gender'
    test_record = r'H:/DATA/CelebA_CelebFaces_Attributes_Dataset/norm/Gender_norm/tf_test_224.tfrecord'
    X = tf.placeholder(tf.float32,shape=[batch_size,image_size,image_size,3])
    Y = tf.placeholder(tf.int32,shape=[batch_size])
    keep_prob = tf.placeholder(tf.float32)
    training = tf.placeholder_with_default(False, shape=(), name='training')
    test_x,test_y = make_record.read_and_decode(test_record,image_size,batch_size=batch_size)

    pred = net.alexnet(X,n_output,keep_prob,training=False,Is_train=False)
    accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(pred,-1),tf.argmax(Y,-1)),tf.float32))
    top_k_op = tf.nn.in_top_k(pred, Y, 1)
    saver = tf.train.Saver(tf.all_variables())

    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init)
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess,coord)

        ckpt = tf.train.get_checkpoint_state(save_dir)
        if ckpt and ckpt.model_checkpoint_path:
            ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
            saver.restore(sess, os.path.join(save_dir, ckpt_name))
        try:
            true_count = 0
            step = 0
            for n in range(200):
                if coord.should_stop():
                    break
                feed_x,feed_y = sess.run([test_x,test_y])
                test_accuracy, predict = sess.run([accuracy,top_k_op],feed_dict={X:feed_x,Y:feed_y,keep_prob:dropout_rate})

                true_count += np.sum(predict)
                step += 1

            rate = true_count/(step*batch_size)
            print('count = ',str(step*batch_size),' , test accuracy = ',str(rate))
        except tf.errors.OutOfRangeError:
            print('complete')
        finally:
            coord.request_stop()
        coord.join(threads)

五、关于训练中出现的问题

1.loss在0.6上下浮动，在训练cifar10（224,224,3）的时候也出现过，最后loss总是在2.3左右，在二分类中相当于loss=0.6，

这里有详细介绍：

https://blog.csdn.net/weixin_34343689/article/details/88111552

但我用了还是没有解决，后来改

（1）我试过将learning_rate设指数递减：

 #设置指数学习率
    global_step = tf.Variable(0,trainable = False)
    learning_rate = tf.train.exponential_decay(0.002, global_step, 500,   0.99,  staircase=True)
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

（2）改过初始化参数，weights初始化将tf.truncated_normal改为tf.random_normal,标准差改为0.01。

kernel = tf.Variable(tf.random_normal([11,11,3,64],dtype=tf.float32,stddev=0.01),name='weight1')

这里有介绍两者之间的区别：https://blog.csdn.net/u014687582/article/details/78027061

后来网上查用batch_normalization，可以解决，相关介绍：

https://www.cnblogs.com/guoyaohua/p/8724433.html

改为：

conv1 = tf.nn.bias_add(conv,biases)
conv1 = tf.layers.batch_normalization(conv1,training=training)
conv1 = tf.nn.relu(conv1,name=scope)

一般是放在relu之前，官网有介绍说以后会移除，所以以后要用tf.keras.layers.batch_normalization。

其实挺神奇的，因为之前也改过这些，但是并没有效果，后来东改改西改改，居然就好了，o(╥﹏╥)o，改参数真是个玄学的事情，愿诸君共勉。

zhou894509

关注

2
点赞
踩
45

收藏

觉得还不错? 一键收藏
5
评论
tensorflow 训练和测试自己的数据集

一、选择数据数据集CelebA_CelebFaces_Attributes_Dataset：这个数据集包含20多万张图片，主要用于人脸属性，性别、年龄、眼镜、胡子等等。我这里主要是训练人脸的性别，先将图片整理分类并归一化，生成TFRecord格式的数据（图片格式（224,224,3）），便于tensorflow训练。整理的数据集在网盘上：https://pan.baidu.com/...
复制链接

扫一扫

专栏目录