keras txt 读取训练数据_Tensorflow2.0 CNN实战(3)-CIFAR_10数据集

CIFAR10数据集同样是属于图像分类问题,共10类。训练集有50000张彩色图片,图像大小为32*32,相对于MNIST/FASHION_MNIST数据集又复杂了一些。测试集有300000张图片,我也不知道我下载的数据集为啥是这样的,暂且不论,能够训练就行,哈哈。需要数据集的可百度云下载,链接:https://pan.baidu.com/s/1tZDvItpqAke_co2lIMrOHg 提取码:24rd

5faa3bc743051b5d9b50f65a85035f0c.png

其中train、test为训练集、测试集,自行解压。trainLabels是训练集的数据标签,sampleSubmission为测试集的训练标签,只是标签全是‘cat’,后续需要自行填入。

1、数据说明

数据集共分为10类'airplan','automobile','bird','cat','deer','dog','frog','horse','ship','truck'

da37c807517c101d3433c662fee3777f.png

2、导入所需库

from __future__ import absolute_import, division, print_function, unicode_literals
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pandas as pd
import os 
import sys
import time
import tensorflow as tf
from tensorflow import keras
import warnings
warnings.filterwarnings('ignore')

print(tf.__version__)
print(sys.version_info)

3、导入数据集

将图片路径和对应的图片标签存放在列表中,方便后面训练

class_names=['airplan','automobile','bird','cat',
             'deer','dog','frog','horse','ship','truck',]
train_lables_file='./cifar-10/trainLabels.csv'
test_csv_file='./cifar-10/sampleSubmission.csv'
train_folder='./cifar-10/train'
test_folder='./cifar-10/test'
def parse_csv_file(filepath,folder):
    results=[]
    with open(filepath,'r') as f:
        lines=f.readlines()[1:]
    for line in lines:
        image_id,label_str=line.strip('n').split(',')
        image_full_path=os.path.join(folder,image_id + '.png')
        results.append((image_full_path,label_str))
    return results

train_labels_info=parse_csv_file(train_lables_file,train_folder)
test_csv_info=parse_csv_file(test_csv_file,test_folder)
import pprint
pprint.pprint(train_labels_info[0:5])
pprint.pprint(test_csv_info[0:5])
print(len(train_labels_info),len(test_csv_info))

a66d8851be3b93a7739eedf12c6e5d60.png

将训练集前45000张图片作为新训练集,后5000张作为验证集,并将列表转化成Dataframe

train_df=pd.DataFrame(train_labels_info[0:45000])
valid_df=pd.DataFrame(train_labels_info[45000:])
test_df=pd.DataFrame(test_csv_info)
train_df.columns=['filepath','class']
valid_df.columns=['filepath','class']
test_df.columns=['filepath','class']
print(train_df.head())
print(valid_df.head())
print(test_df.head())

8961b9a81a117d8b657be62d230e7542.png

3、数据增强

这里用keras里面的API来对图片做一些预处理操作以便后期训练,读取数据并做数据增强。

height=32
width=32
channels=3
batch_size=32
num_classes=10
#这里我们用keras里面的API来对图片做一些预处理操作以便后期训练,读取数据并做数据增强
train_datagen=keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')
#用来设置对图片数据进行训练时的一些参数
train_genetator=train_datagen.flow_from_dataframe(
    train_df,
    directory='./',
    x_col='filepath',
    y_col='class',
    classes=class_names,
    target_size=(height,width),
    batch_size=batch_size,
    seed=7,
    shuffle=True,
    class_mode='sparse',)
valid_datagen=keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255)
valid_genetator=valid_datagen.flow_from_dataframe(
    valid_df,
    directory='./',
    x_col='filepath',
    y_col='class',
    classes=class_names,
    target_size=(height,width),
    batch_size=batch_size,
    seed=7,
    shuffle=False,
    class_mode='sparse')
train_num=train_genetator.samples
valid_num=valid_genetator.samples
print(train_num,valid_num)

dc432c108a4ca85af95e062e402460ac.png

4、构建训练模型

两层卷积层之后跟着一层池化层,最后使用全连接层,优化器adam函数。可以发现训练的参数较多,由于CPU计算的太慢,而GPU笔记本显卡级别太低,所以花费了好长时间才训练完

model = keras.models.Sequential()
model.add(keras.layers.Conv2D(filters=128,kernel_size=3,
                             padding='same',activation='relu',
                             input_shape=(width,height,channels)))
model.add(keras.layers.BatchNormalization())#为了训练更快
model.add(keras.layers.Conv2D(filters=128,kernel_size=3,
                             padding='same',activation='relu'))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.MaxPool2D(pool_size=2))

model.add(keras.layers.Conv2D(filters=256,kernel_size=3,
                             padding='same',activation='relu'))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Conv2D(filters=256,kernel_size=3,
                             padding='same',activation='relu'))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.MaxPool2D(pool_size=2))

model.add(keras.layers.Conv2D(filters=512,kernel_size=3,
                             padding='same',activation='relu'))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Conv2D(filters=512,kernel_size=3,
                             padding='same',activation='relu'))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.MaxPool2D(pool_size=2))

model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(512,activation='relu'))


model.add(keras.layers.Dense(num_classes,activation='softmax'))

# 配置训练学习过程,设置损失函数,优化器和训练指标
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.summary()

283d4c71115814fb8009e703d6650efd.png

5、开始训练

这里迭代了20次,训练过程被我无意中关掉了,但好歹将训练完的模型进行了保存,最后的准确度有80%

epochs=20
# batch_size=64
history=model.fit_generator(
    train_genetator,
    steps_per_epoch=train_num//batch_size,
    epochs=epochs,
    validation_data=valid_genetator,
    validation_steps=valid_num//batch_size
    )
model.save('cifar_10.h5')

76406eb957f5818b93802716484d2071.png

对测试集进行训练,不知道为啥,测试集训练进度一直出不来,可能是因为图片太多的缘故,不管啦,直接预测吧,嘿嘿。

test_model=keras.models.load_model('cifar_10.h5')
test_datagen=keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255)
test_genetator=test_datagen.flow_from_dataframe(
    test_df,
    directory='./',
    x_col='filepath',
    y_col='class',
    classes=class_names,
    target_size=(height,width),
    batch_size=128,
    seed=7,
    shuffle=True,
    class_mode='sparse')
test_num=test_genetator.samples
test_predict=test_model.predict_generator(test_genetator)

6、预测图片

从网上下载了两张图片,命名有点随意哈,^-^

a3adf43dddad07f54c555dfeca6a23b4.png

将图片路径和‘cat’标签转变成dataframe,形成与训练时结构一直

list_img=['./cifar-10/predict/kache.png','./cifar-10/predict/ma1.png']
a={'filepath':list_img, 'class':'cat'}
print(a)
predict_df=pd.DataFrame(a)
print(predict_df)

06dfe9c294a585339c683cf9aebd07f7.png

对预测的图片进行数据增强

predict_datagen=keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255)
predict_genetator=valid_datagen.flow_from_dataframe(
    predict_df,
    directory='./',
    x_col='filepath',
    y_col='class',
    classes=class_names,
    target_size=(height,width),
    batch_size=batch_size,
    seed=7,
    shuffle=False,
    class_mode='sparse')#为稀疏
predict_num=predict_genetator.samples

print(predict_num)

开始预测,可以发现两张图片都预测成功了,分别是卡车和马

predict_img=test_model.predict(predict_genetator)
predict_class_indices = np.argmax(predict_img, axis = 1)
predict_class = [class_names[index] 
                      for index in predict_class_indices]
print(predict_class)

9c4671d36927a352b86b9a2e58785579.png

好啦,到这里就结束了,想学习机器学习的小伙伴,尽量显卡等级高一些,配有GPU环境,不然训练速度就和我这差不多了,贼慢。

欢迎评论和留言。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值