Inter校企合作猫狗大战

最新推荐文章于 2024-09-27 19:01:03 发布

星星不想醒

最新推荐文章于 2024-09-27 19:01:03 发布

阅读量888

点赞数 16

文章标签：机器学习深度学习人工智能

本文链接：https://blog.csdn.net/m0_67197896/article/details/135553076

版权

2.3创建训练集和测试集的DataFrame对象

4.3使用OneAPI加速后的模型对测试集预测

5：总结

1：猫狗大战简介

1.1问题描述

在这个问题中，你将面临一个经典的机器学习分类挑战——猫狗大战。你的任务是建立一个分类模型，能够准确地区分图像中是猫还是狗。

1.2预期解决方案

在这个问题中，你将面临一个经典的机器学习分类挑战——猫狗大战。你的任务是建立一个分类模型，能够准确地区分图像中是猫还是狗。

1.3数据集

链接：百度网盘请输入提取码

提取码：jc34

1.4图像展示

2：数据预处理

2.1数据集结构

本数据集仅用作训练模型，测试集并不使用其中的test数据集（本文件夹中test数据集没有标签）

2.2数据集展示

train_path = './data/train'
test_path = './data/real_test'
train_file_names = os.listdir(train_path)
test_file_names = os.listdir(test_path)
print("训练集大小：{}".format(len(train_file_names)))
print("测试集大小：{}".format(len(test_file_names)))
print("训练集样例：{}".format(train_file_names[0:5]))#训练集文件名：标签+序号
print("测试集样例：{}".format(test_file_names[0:5])) #测试集文件名：序号

2.3创建训练集和测试集的DataFrame对象

#训练集 train_image_path  label
import pandas

image_paths = []
labels = []
for train_file in train_file_names:
    label = train_file.split('.')[0]
    labels.append(label)
    image_path = os.path.join(train_path, train_file)
    image_paths.append(image_path)
    
train_df = pandas.DataFrame()
train_df['train_image_path'] = image_paths #为自定义属性名添加列表
train_df['label'] = labels
train_df.head()

#测试集 
import pandas

image_paths = []
labels = []
for test_file in test_file_names:
    label = test_file.split('.')[0]
    labels.append(label)
    image_path = os.path.join(test_path, test_file)
    image_paths.append(image_path)
    
test_df = pandas.DataFrame()
test_df['train_image_path'] = image_paths #为自定义属性名添加列表
test_df['label'] = labels
test_df.head()

2.4划分训练集和验证集

train数据集共25000张图片，猫狗各一半，此处采用分层抽样

from sklearn.model_selection import train_test_split

train_set, val_set = train_test_split(train_df, random_state=42, stratify=train_df['label'])# 设置分层抽样
print("训练集大小：{}".format(len(train_set)))
print("验证集大小：{}".format(len(val_set)))
train_set['label'].hist() 
val_set['label'].hist()

2.5数据增强

基于ImageDataGenerator创建一个训练数据生成器，验证数据生成器和测试数据生成器

from keras.preprocessing.image import ImageDataGenerator

train_gen = ImageDataGenerator(
    zoom_range=0.1,
    rotation_range=10,
    rescale=1./255,
    shear_range=0.1,
    horizontal_flip=True,
    width_shift_range=0.1,
    height_shift_range=0.1
)
train_generator = train_gen.flow_from_dataframe(
    dataframe=train_set,
    x_col='train_image_path',
    y_col='label',
    target_size=(200,200),
    class_mode='binary',
    batch_size=128,
    shuffle=False
)
print(len(train_generator))

#验证集
val_gen = ImageDataGenerator(
    rescale=1./255
)
val_generator = val_gen.flow_from_dataframe(
    dataframe=val_set,
    x_col='train_image_path',
    y_col='label',
    target_size=(200,200),
    class_mode='binary',
    batch_size=128,
    shuffle=False
)
print(len(val_generator))

test_gen = ImageDataGenerator(rescale=1./255) # 测试集不需要进行数据增强，只需归一化即可
test_generator = test_gen.flow_from_dataframe(
    dataframe=test_df,
    x_col='train_image_path',
    y_col='label',
    target_size=(200,200),
    class_mode='binary',
    batch_size=128,
    shuffle=False
)
print(len(test_generator))

3：基于VGG16进行模型训练

3.1 处理vgg16使其更适合本问题

冻结预训练模型的权重并定义一个Adam优化器，设置学习率为0.001。使用compile方法编译模型，指定了损失函数为二分类交叉熵，度量指标为准确率（acc）

from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.optimizers import Adam

vgg_model = VGG16(weights='imagenet', include_top=False, input_shape=(200, 200, 3))

model = Sequential()
model.add(vgg_model)
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))

# 冻结预训练模型的权重
vgg_model.trainable = False

optimizer = Adam(learning_rate=0.001)
model.compile(optimizer=optimizer,
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.summary()

3.2进行10次迭代训练

使用训练数据集和验证数据集对神经网络模型进行训练，并使用回调函数来保存最优模型和控制学习率的调整，将训练好的模型进行保存

from tensorflow.keras.callbacks import ReduceLROnPlateau, ModelCheckpoint

# 定义回调函数，用于保存最优模型和权重
cp_callback = ModelCheckpoint(filepath='vgg_model.h5', save_best_only=True)

reduce_lr = ReduceLROnPlateau(monitor='val_loss', patience=3, factor=0.1, min_lr=0.00001)

history = model.fit(
    train_generator,
    epochs=10,
    batch_size=128,
    validation_data=val_generator,
    callbacks=[cp_callback, reduce_lr],
    verbose=1
)

3.3展示训练结果

#训练集acc/loss
acc = history.history['accuracy']
loss = history.history['loss']
#测试集acc/loss
val_acc = history.history['val_accuracy']
val_loss = history.history['val_loss']

#acc曲线
plt.subplot(1,2,1)
plt.plot(acc, label='Training Acc')
plt.plot(val_acc, label='Validation Acc')
plt.title('Training and Validation ACC')
plt.legend()

#loss曲线
plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()

可以看出，经过10次训练acc在0.93以上，loss在0.17以下，效果良好

4：预测

4.1使用验证集进行预测

from sklearn.metrics import f1_score
import numpy as np

# 加载保存的模型并进行预测
loaded_model = tf.keras.models.load_model('./model/vgg_model.h5')

predictions = loaded_model.predict(val_generator, steps=len(val_generator))

# 将预测结果转换为类别标签
predicted_classes = np.array([int(prediction > 0.5) for prediction in predictions])

# 计算并输出 F1 分数
true_labels = val_generator.classes
f1 = f1_score(true_labels, predicted_classes)
print("F1 score:", f1)

经过229s预测完成，f1分数为0.937

4.2使用OneAPI进行加速

import tensorflow as tf
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  # 防止显示不必要的警告信息

# 设置TensorFlow为使用oneAPI加速
os.environ['TF_ENABLE_ONEAPI'] = '1'

# 加载已训练好的模型
model = tf.keras.models.load_model('./model/vgg_model.h5')

# 设置批量推理
batch_size = 128

# 使用oneAPI加速推理
with tf.device('/CPU:0'):
    predictions = model.predict(val_generator, batch_size=batch_size)
# 将预测结果转换为类别标签
# 保存模型
model.save('./model/oneAPI_model.h5')

predicted_classes = np.array([int(prediction > 0.5) for prediction in predictions])

# 计算并输出 F1 分数
true_labels = val_generator.classes
f1 = f1_score(true_labels, predicted_classes)
print("F1 score:", f1)

经过198s预测完成，f1分数也为0.937。预测所需时间下降但f1分数不变，实现了使用OneAPI的预期效果。

4.3使用OneAPI加速后的模型对测试集预测

from sklearn.metrics import f1_score
import numpy as np
import tensorflow as tf
# 加载保存的模型并进行预测
loaded_model = tf.keras.models.load_model('./model/oneAPI_model.h5')

predictions = loaded_model.predict(test_generator, steps=len(test_generator))

# 将预测结果转换为类别标签
predicted_classes = np.array([int(prediction > 0.5) for prediction in predictions])

# 计算并输出 F1 分数
true_labels = test_generator.classes
f1 = f1_score(true_labels, predicted_classes)
print("F1 score:", f1)