tensorflow2.x自建数据集训练

最新推荐文章于 2024-07-03 11:42:50 发布

smallworldxyl

最新推荐文章于 2024-07-03 11:42:50 发布

阅读量1.1k

点赞数 4

分类专栏：深度学习/tensorflow 文章标签： python pytorch 深度学习

本文链接：https://blog.csdn.net/smallworldxyl/article/details/120836022

版权

深度学习/tensorflow 专栏收录该内容

14 篇文章 1 订阅

订阅专栏

1.导入库

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
import os
import pathlib
import random
import matplotlib.pyplot as plt

2.读取文件
数据是人和马的图片集，存储在humanandhorse父目录下，子目录下分别是二者的图片。
在这里插入图片描述

data_root = pathlib.Path('E:/tensorflowdataset/humanandhorse')
print(data_root)
for item in data_root.iterdir():
  print(item)

E:\tensorflowdataset\humanandhorse
E:\tensorflowdataset\humanandhorse\horses
E:\tensorflowdataset\humanandhorse\humans

用glob方法读取数据存储到list数组，并统计共有多少张图片

all_image_paths = list(data_root.glob('*/*'))
print(all_image_paths[:10])
all_image_paths = [str(path) for path in all_image_paths]
print(all_image_paths[:10])
image_count = len(all_image_paths)
print(image_count)

[WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-000.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-105.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-122.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-127.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-170.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-204.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-224.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-241.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-264.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-276.png’)]
[‘E:\tensorflowdataset\humanandhorse\horses\horse1-000.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-105.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-122.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-127.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-170.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-204.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-224.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-241.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-264.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-276.png’]
256
3.展示部分图片
label = image_path.split(’\’)[-2]将数据的目录的倒数第二级目录作为标签

import matplotlib.pyplot as plt
from PIL import Image

plt.figure('image show')
for n in range(3):
	image_path = random.choice(all_image_paths)
	label = image_path.split('\\')[-2]
	image = Image.open(image_path)
	print(image.size)
 
	plt.subplot(1, 3, n+1)
	plt.title(label)
	plt.imshow(image)
plt.show()

在这里插入图片描述
4.设置label
先获取有哪些标签

label_names = sorted(item.name for item in data_root.glob('*/') if item.is_dir())
print(label_names)

[‘horses’, ‘humans’]

用将标签按顺序排好标记

label_to_index = dict((name, index) for index, name in enumerate(label_names))
print(label_to_index)

{‘horses’: 0, ‘humans’: 1}
确定每张图片的标签

all_image_labels = [label_to_index[pathlib.Path(path).parent.name]
                    for path in all_image_paths]

print(all_image_labels)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
5.预处理数据
通过tf.io.read_file将图片路径名转化为图片张量，并将每个像素值转换为[0 - 1]的范围（方便训练）


def preprocess_image(img_raw):
	img_tensor = tf.image.decode_jpeg(contents=img_raw, channels=3) # can be used for plt.imshow(img_tensor)
	img_final = tf.image.resize(images=img_tensor, size=[300, 300])
	img_final /= 255.0 # normalize to [0,1] range
	return img_final
 
def load_and_preprocess_image(path):
	img_raw = tf.io.read_file(path) # can't be used for plt.imshow(img_raw)
	return preprocess_image(img_raw)
 
def load_and_preprocess_from_path_label(path, label):
	return load_and_preprocess_image(path), label

6.构建dataset
将图片和标签一一打包。
tf.data.Dataset.from_tensor_slices返回的ds具有很多实用的方法用来操作数据集，例如：shuffle、batch、repeat等，方便后来加载进模型进行训练。

ds = tf.data.Dataset.from_tensor_slices((all_image_paths, all_image_labels))
for item_x, item_y in ds:
    print(item_x.numpy(), item_y.numpy())

b’E:\tensorflowdataset\humanandhorse\horses\horse1-000.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-105.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-122.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-127.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-170.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-204.png’ 0
…

在调用模型训练方法model.fit()时，其参数要求为``model.fit(x,y,batch_size,epochs)`，

若参数x被指定为Dataset对象，则参数y和batch_size不应该被填写，此时要求Dataset中储存的元素为批数据（），其中每一批的元素要求为(特征,标签)元组。

故我们更期望Dataset中储存(特征,标签)结构的数据。此时就可以灵活的使用tf.data.Dataset.from_tensors()与tf.data.Dataset.from_tensor_slices()方法了，如果内存中的是”特征-标签“对，则使用tf.data.Dataset.from_tensors()加载，内存中储存的是(多批特征向量,多批标签)则使用tf.data.Dataset.from_tensor_slices()加载

image_label_ds = ds.map(load_and_preprocess_from_path_label)
image_label_ds=image_label_ds.batch(1)

这里的image_label_ds是mapdataset,1*300*300*3的向量和标签组成的数据对。其中第一个数据如下：
[[[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]
…

[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]]] [0]
batch() 方法将原先的数据集进行分批处理，在可迭代的元素的第1维度增加1维形成批。
如果不进行batch,下面训练时会报错： expected conv2d_10_input to have 4 dimensions, but got array with shape (300,300,3)，意思是维度不匹配
7.构建模型并训练

model = tf.keras.models.Sequential([
    # Note the input shape is the desired size of the image 300x300 with 3 bytes color
    # This is the first convolution
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The second convolution
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The third convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fourth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fifth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(),
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'),
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for the other ('humans')
    tf.keras.layers.Dense(1, activation='sigmoid')
])
from tensorflow.keras.optimizers import RMSprop
model.compile(loss='binary_crossentropy',
              optimizer=RMSprop(lr=0.001),
              metrics=['acc'])
history = model.fit(
       image_label_ds,
     steps_per_epoch=8,  
      epochs=15,
    )

smallworldxyl

关注

4
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
tensorflow2.x自建数据集训练

1.导入库import tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras import datasets, layers, optimizers, Sequential, metricsfrom tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2import osimport pathli
复制链接

扫一扫

专栏目录