一、简介:杂草检测
问题描述:
杂草是农业经营中不受欢迎的入侵者,它们通过窃取营养、水、土地和其他关键资源来破坏种植,这些入侵者会导致产量下降和资源部署效率低下。一种已知的方法是使用杀虫剂来清除杂草,但杀虫剂会给人类带来健康风险。我们的目标是利用计算机视觉技术可以自动检测杂草的存在,开发一种只在杂草上而不是在作物上喷洒农药的系统,并使用针对性的修复技术将其从田地中清除,从而最小化杂草对环境的负面影响。
预期解决方案:
我们期待您将其部署到模拟的生产环境中——这里推理时间和二分类准确度(F1分数)将作为评分的主要依据。
数据集:
https://filerepo.idzcn.com/hack2023/Weed_Detection5a431d7.zip
图像展示:
标签展示:
corp:0 0.478516 0.560547 0.847656 0.625000
weed:1 0.514648 0.441406 0.861328 0.671875
二、数据预处理
数据集结构:
数据集分为图片(images)和标签(labels)。
标签含义:第一个数据代表种类,0为农作物,1为杂草。后面四个数据分别为x_center、y_center、height、width。上面的corp标签含义展示如下:
数据集处理:
先把所有图像的文件名放到data.text中方便后续处理,再对图像做对比度增强和归一化处理,代码如下:
def get_file_name(images_dir):
images_files = [f for f in os.listdir(images_dir) if f.endswith('.jpeg')]
images_files.sort()
with open(r'./data.txt','a') as f:
for i in images_files:
f.write(i+'\n')
f.close()
print('文件写入完成!!!')
get_file_name('/content/drive/MyDrive/weeds/data')
transformer = transforms.Compose([
transforms.ToTensor(),
transforms.ColorJitter(contrast=0.5), # 增强对比度
transforms.Normalize(mean=[0.5], std=[0.5]) # 归一化
])
train_images_tensor = []
with open(r'/content/data.txt','r') as f:
file_name_url=[i.split('\n')[0] for i in f.readlines()]
for i in range(len(file_name_url)):
image = Image.open('/content/drive/MyDrive/weeds/data/'+file_name_url[i])
tensor = transformer(image.convert('L')).type(torch.float16)
train_images_tensor.append(tensor)
接下来处理标签并把数据集划分成两份,分别是train、test数据集,train数据集占总数的7成,test数据集用剩下的部分。
image_train = []
image_test = []
for i in range(len(train_images_tensor)):
if i<=len(train_images_tensor)*0.7:
image_train.append(train_images_tensor[i])
else:
image_test.append(train_images_tensor[i])
transformerlab = transforms.Compose([
transforms.ToTensor()
])
train_lables_tensor = []
with open(r'/content/data.txt','r') as f:
file_name_url=[i.split('.')[0] for i in f.readlines()]
train_lables_tensor = []
for i in range(len(file_name_url)):
image = open('/content/drive/MyDrive/weeds/data/' + file_name_url[i] + '.txt')
labels = image.readline()[0]
labels = float(labels)
tensor = torch.tensor(labels, dtype=torch.float16) # 使用float16数据类型
train_lables_tensor.append(tensor)
lables_train = []
lables_test = []
for i in range(len(train_lables_tensor)):
if i <=len(train_lables_tensor)*0.7:
lables_train.append(train_lables_tensor[i])
else:
lables_test.append(train_lables_tensor[i])
构建数据集:
用PyTorch框架来构建一个图像分类的数据集和数据加载器。
train_datas_tensor = torch.stack(image_train)
train_labels_tensor = torch.stack(lables_train)
test_datas_tensor = torch.stack(image_test)
test_labels_tensor = torch.stack(lables_test)
train_dataset = TensorDataset(train_labels_tensor, train_datas_tensor)
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataset = TensorDataset(test_labels_tensor, test_datas_tensor)
test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=True)
三、使用ResNet50进行训练
ResNet50:
ResNet50的主要特点是使用了残差块(residual block),可以有效地解决深度网络的退化问题,提高了网络的性能和稳定性。
ResNet50的Backbone部分结构如下图所示:
ResNet50的结构可以分为七个部分:第一部分是一个卷积层,用于对输入图像进行预处理,降低图像的尺寸和通道数。第二到第六部分是由残差块组成的五个阶段(stage),每个阶段包含多个残差块,用于提取图像的特征。每个阶段的第一个残差块会对输入进行下采样,降低特征图的尺寸,增加特征图的通道数。每个残差块由三个卷积层和一个跳跃连接(skip connection)组成,跳跃连接可以将输入和输出相加,形成残差学习的机制。第七部分是一个全局平均池化层和一个全连接层,用于对特征图进行汇总和分类,输出最终的预测结果。
构建ResNet50:
使用PyTorch框架来构建一个Resnet50网络。
class Residual(nn.Module):
def __init__(self, input_channels, num_channels, use_conv=False, strides=1):
super().__init__()
self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1, stride=strides)
self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3, padding=1)
if use_conv:
self.conv3 = nn.Conv2d(input_channels, num_channels, kernel_size=1, stride=strides)
else:
self.conv3 = None
self.bn1 = nn.BatchNorm2d(num_channels)
self.bn2 = nn.BatchNorm2d(num_channels)
def forward(self, X):
Y = F.relu(self.bn1(self.conv1(X)))
Y = self.bn2(self.conv2(Y))
if self.conv3:
X = self.conv3(X)
Y += X
return F.relu(Y)
b1 = nn.Sequential(nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
nn.BatchNorm2d(64), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
def resnet_block(input_channels, num_channels, num_residuals, first_block=False):
blk = []
for i in range(num_residuals):
if i == 0 and not first_block:
blk.append(Residual(input_channels, num_channels, use_conv=True, strides=2))
else:
blk.append(Residual(num_channels, num_channels))
return blk
b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
b3 = nn.Sequential(*resnet_block(64, 128, 2))
b4 = nn.Sequential(*resnet_block(128, 256, 2))
b5 = nn.Sequential(*resnet_block(256, 512, 2))
net = nn.Sequential(b1, b2, b3, b4, b5,
nn.AdaptiveAvgPool2d((1, 1)),
nn.Flatten(), nn.Linear(512, 10))
net = torchvision.models.resnet50(pretrained=True)
net.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
num_features = net.fc.in_features
net.fc = nn.Linear(num_features, 2)
使用GPU训练:
设置好训练参数,这里损失函数使用交叉熵损失函数,优化器用随机梯度下降(SGD)算法实现。
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device).half()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
在GPU上训练19次并导出模型。
for epoch in range(1, 20):
running_loss = 0.0
num_images = 0
loop = tqdm(enumerate(train_dataloader, 0))
for step, data in loop:
labels, inputs = data[0].to('cuda').float(), data[1].to('cuda').float()
# labels, inputs = data[0].float(), data[1].float()
optimizer.zero_grad()
inputs = inputs.half()
outputs = net(inputs)
# 创建包含相同数量的目标值的示例目标张量
target = labels # 使用实际标签作为目标
# 使用 MSE 损失函数
loss = criterion(outputs, target.long())
loss.backward()
optimizer.step()
num_images += inputs.size(0)
running_loss += loss.item()
loop.set_description(f'Epoch [{epoch}/20]')
loop.set_postfix(loss=running_loss / (step + 1))
print('Finish!!!')
torch.save(net.state_dict(), '/content/drive/MyDrive/weeds/detectmodles/resnet.pth')
测试模型:
对导出的模型进行测试集的预测时间和F1分数的测试。
all_predictions = []
all_labels = []
start_time = time.time()
with torch.no_grad():
for data in test_dataloader:
images, labels = data[1].to('cuda').float(), data[0].to('cuda').long() # 将标签转换为整数类型
net = net.float()
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
all_predictions.extend(predicted.cpu().numpy())
all_labels.extend(labels.cpu().numpy())
end_time = time.time()
elapsed_time = end_time - start_time
print(f'测试集用的时间为: {elapsed_time:.2f} seconds')
f1 = f1_score(all_labels, all_predictions, average='binary')
print(f'测试F1分数: {f1:.4f}')
结果如下:
从测试集选一张图片进行展示:
import matplotlib.pyplot as plt
import numpy as np
sample_image, true_label = next(iter(test_dataloader))
sample_image, true_label = data[1].to('cuda').float(), data[0].to('cuda').long()
with torch.no_grad():
net = net.float()
model_output = net(sample_image)
_, predicted_label = torch.max(model_output, 1)
sample_image = sample_image.cpu().numpy()[0]
predicted_label = predicted_label[0].item()
true_label = true_label[0].item()
# 获取类别标签
class_labels = ['corp', 'weed']
# 显示图像
plt.imshow(np.transpose(sample_image, (1, 2, 0))) # 转置图片的维度顺序
plt.title(f'TRUE LABEL IS: {class_labels[true_label]}, PREDICT LABEL IS: {class_labels[predicted_label]}')
plt.axis('off')
plt.show()
结果如下:
四、转移模型到CPU
构建ResNet50模型:
构建模型框架。
class Residual(nn.Module):
def __init__(self, input_channels, num_channels, use_conv=False, strides=1):
super().__init__()
self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1, stride=strides)
self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3, padding=1)
if use_conv:
self.conv3 = nn.Conv2d(input_channels, num_channels, kernel_size=1, stride=strides)
else:
self.conv3 = None
self.bn1 = nn.BatchNorm2d(num_channels)
self.bn2 = nn.BatchNorm2d(num_channels)
def forward(self, X):
Y = F.relu(self.bn1(self.conv1(X)))
Y = self.bn2(self.conv2(Y))
if self.conv3:
X = self.conv3(X)
Y += X
return F.relu(Y)
b1 = nn.Sequential(nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
nn.BatchNorm2d(64), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
def resnet_block(input_channels, num_channels, num_residuals, first_block=False):
blk = []
for i in range(num_residuals):
if i == 0 and not first_block:
blk.append(Residual(input_channels, num_channels, use_conv=True, strides=2))
else:
blk.append(Residual(num_channels, num_channels))
return blk
b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
b3 = nn.Sequential(*resnet_block(64, 128, 2))
b4 = nn.Sequential(*resnet_block(128, 256, 2))
b5 = nn.Sequential(*resnet_block(256, 512, 2))
net = nn.Sequential(b1, b2, b3, b4, b5,
nn.AdaptiveAvgPool2d((1, 1)),
nn.Flatten(), nn.Linear(512, 10))
resnet50_model = torchvision.models.resnet50(pretrained=True)
resnet50_model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
num_features = resnet50_model.fc.in_features
resnet50_model.fc = nn.Linear(num_features, 2)
导入训练好的模型并将模型转移到CPU上。
resnet50_model.load_state_dict(torch.load('resnet.pth', map_location=torch.device('cpu')))
net = resnet50_model
net.to('cpu')
CPU测试:
在CPU上使用课程公共测试集测试的预测时间和F1分数的测试分别为9.96s和0.9796。
all_predictions = []
all_labels = []
start_time = time.time()
with torch.no_grad():
for data in test_dataloader:
images, labels = data[1].to(torch.device('cpu')).float(), data[0].to(torch.device('cpu')).long() # 将标签转换为整数类型
net = net.float()
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
all_predictions.extend(predicted.cpu().numpy())
all_labels.extend(labels.cpu().numpy())
end_time = time.time()
elapsed_time = end_time - start_time
print(f'测试集用的时间为: {elapsed_time:.2f} seconds')
f1 = f1_score(all_labels, all_predictions, average='binary')
print(f'测试F1分数: {f1:.4f}')
测试结果如下:
五、使用oneAPI工具进行优化
oneAPI优化:
这里使用Inter Extension for Pytorch和Intel Optimization for PyTorch进行优化。重新定义优化器和损失函数并使用oneAPI组件进行优化。
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001, weight_decay=1e-4)
net,optimizer = ipex.optimize(net,optimizer=optimizer)
优化后模型测试:
使用课程公共测试集进行测试。
all_predictions = []
all_labels = []
start_time = time.time()
with torch.no_grad():
for data in test_dataloader:
images, labels = data[1].to(torch.device('cpu')).float(), data[0].to(torch.device('cpu')).long() # 将标签转换为整数类型
net = net.float()
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
all_predictions.extend(predicted.cpu().numpy())
all_labels.extend(labels.cpu().numpy())
end_time = time.time()
elapsed_time = end_time - start_time
print(f'测试集用的时间为: {elapsed_time:.2f} seconds')
f1 = f1_score(all_labels, all_predictions, average='binary')
print(f'测试F1分数: {f1:.4f}')
结果如下:
oneAPI优化后的模型预测时间为6.25s,F1分数为0.9796。可以看到优化后的模型预测时间缩短了3.71s,速度提升已十分明显而且F1分数并没有减小,模型的精度也没有缺失。由于测试数据集较小,在数据量更大的测试集进行测试时预测时间的差距还会更大,oneAPI组件的优化效果会更明显。