RCNN_人脸检测
1966年,Marvin Minsky让他的学生Gerald Jay Sussman花一个暑假的时间,把相机连接到电脑上并使得电脑能描述出看到的东西,这就是目标检测的由来。
2014年,作者RBG提出RCNN模型,创造性地使用Selective Search方法代替滑动窗口策略,并利用CNN提取图像特征,使之成为第一个可以真正工业级应用的解决方案,掀起了目标检测领域的研究热潮。
前言
本文基于RCNN论文的思路,对人脸检测问题进行具体研究。给定一张RGB图片,找出人脸区域位置。与原模型不同的是,人脸检测问题更为简单,我对CNN网络结构进行了细微修改,并减少了网络层数。同时由于我是基于LFW人脸数据集,并只对单张人脸图片进行检测,所以删去了RCNN模型后续的SVM算法和非极大抑制策略,改用了平均策略以提高检测区域的稳健性。
一、数据集准备
我从网上下载了LWF人脸数据集,并从中取出了近5k张人脸图片。
第1个麻烦之处在于,我没有人脸区域位置信息的数据集。我调用MATLAB中Viola Jones函数,对LWF中人脸图片进行检测并保留下了人脸区域,由此得到4k张标准人脸图片,作为后续训练使用。
第2个麻烦之处在于,在对人脸区域进行二分类判别时,我没有负样本数据集。我从网上下载了3k张风景图片数据集,作为负样本,用以后续训练二分类使用。
第3个麻烦之处在于,在RCNN模型中,最终是要对Selective Search算法的分割区域进行二分类,但算法实际分割出的区域,可能比整个人脸区域稍大一些,也可能比整个人脸区域稍小一些。如果网络只用标准人脸进行训练,最后结果就会把很多实际是人脸区域的图片也判别为0,无法进行检测。
仿照RCNN中pre_train与fine_train的思路,我采取了这样一种策略。
首先利用Selective Search对整个LFW人脸数据集进行分割,把所有分割区域都存储在一个文件夹里面,大概得到了10w张区域图片。
手动挑选出近500张人脸区域,近500张非人脸区域,搭建非常简单的网络进行二分类。因为有标注的数据很少,我只能采用简单网络以避免过拟合。
利用训练好的简单网络,对一部分的区域图片进行二分类,根据分类结果,把分类准确的人脸区域添加进正样本数据集,把误判样本添加进负样本数据集,接着训练,并逐步增加网络复杂度。将这个过程重复3次,最后得到了3674张人脸区域图片,4440张非人脸区域图片。
此时,我已经有3674 + 4000 = 7674张人脸图片,4440 + 3000 = 7440张非人脸图片。我将其全部resize到(96.96)大小,这些图片都是已有类别标注信息的,足够用来训练一个较好的网络。
train_x : (12000, 96, 96)
train_y : (12000, 2)
test_x : (3114, 96, 96)
test_y : (3114, 2)
二、网络结构
因为我需要处理的只是人脸图片、非人脸图片的二分类问题,所以网络结构无需过于复杂,首先排除了VGG这种几十层的深层网络结构。
尝试了LeNet网络结构,但效果并不理想。可能是因为我需要将半脸区域、人脸区域区分开,让网络能够将不完整的部分人脸区域也识别为0。当网络层数较浅、模型结构较简单时,是无法提取到这种深层特征的。
最后我在AlexNet网络结构的基础上做了一些细微调整,效果还算差强人意。
np.random.seed(1)
model = Sequential()
model.add(Conv2D(16, (5, 5), input_shape=(96, 96, 1), strides=1))
model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(Activation('relu'))
model.add(Conv2D(32, (5, 5), input_shape=(96, 96, 1), strides=1))
model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3), strides=1))
model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(Activation('relu'))
model.add(Conv2D(128, (3, 3), strides=1))
model.add(Activation('relu'))
model.add(Conv2D(256, (3, 3), strides=1))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))
model.summary()
三、区域检测
训练好二分类网络后,人脸检测问题就变得很简单。输入单张人脸图片,利用Selective Search算法对该图片进行分割,得到许多候选区域。利用网络对这些候选区域进行二分类,把分类为1的区域记录下来,这就是检测到的人脸区域。
此时可能得到许多个人脸区域检测框,检测效果都大体上使人满意,我们想要筛选出最佳的一个候选框,作为最终人脸检测区域。RCNN论文中的处理方法是,利用CNN提取到的特征再搭建一个SVM模型,采用非极大抑制策略进行筛选。
我本来想直接选网络softmax输出最大的那张图片,将其作为最后的人脸检测区域,但实际中碰到了一个麻烦。softmax最终输出结果本应是一个0-1的小数,正好可以作为概率的度量,但我的网络最终输出结果都是1,非极大抑制策略没法继续进行下去。
后面我采取了这么一种处理办法:
因为Selective Search算法分割出的效果不是100%完美的,有时偏向人脸左边区域,有时偏向人脸右边区域。再加上训练出的网络分类效果也不是100%完美的,有时会把一个包含人脸的较大区域识别为1,有时会把人脸部分区域识别为1。
借用数理统计中的均值思想,既然这些检测区域都大体上使人满意,但又或多或少存在一下瑕疵。不妨对所有的检测区域位置坐标取平均值,让偏左的区域与偏右的区域相互抵消,偏大的区域与偏小的区域相互抵消,最终效果岂不是完美了么。
四、模型效果
我抽取了100张图片进行人脸检测,最终检测精度在82%左右,算大体上使人满意了。我还拍了几张自己的图片来做检测,效果也还不错。当人脸占比不过大不过小、拍摄背景环境不复杂的情形下,模型在现实场景下的效果还是可用的。
至于为什么其他人做出来的人脸检测效果这么好(都快接近100%了),我也没想通,每次我自己复现出来的效果都只是勉勉强强。要是后续发现了什么能改进效果的地方,再回过头来调整吧。
五、对结果的一些深入思考
Ques1:RCNN模型的亮点有哪些?
(1)传统目标检测都是采用滑动窗口策略,很长一段时间都没人能想到更好的idea。作者借助区域聚类,从区域连续性这个角度对滑动窗口策略做了优化。
idea的突破点:一般我们想要检测的目标物体的区域纹理、区域色彩都是连续的,而滑动窗口策略把很多本不应该割裂开来的区域生生断开,这无疑是一种巨大的计算浪费。Selective Search算法正是从这个角度进行了改进,避免了许多区域割裂情况。
(2)作者提出用CNN对每个proposal region进行特征提取。
(3)这篇论文也带来了一个观点:当你缺乏大量的标注数据时,比较可行的手段是进行神经网络的迁移学习,采用在其他大型数据集训练过后的神经网络,然后在小规模特定的数据集中进行fine-tune微调。
Ques2:要想让RCNN模型跑出效果,必须抓住的3个关键点。
(1)Selective Search算法生成候选区域时,一定要确保分割出我们想要的效果,即一定要确保候选区域有我们想要的目标物体。如果分割效果不佳,可以尝试调整felzenszwalb函数中超参数值。
(2)训练出来的二分类网络一定要分类精度高,而且一定要具有鲁棒性。网络一定要能将无效背景区域都分类为0,还要能将许多残缺部分的目标物体也分类为0,而且一旦目标区域为真,应该非常准确地分类为1。如果分类效果不佳,一是可以改善数据集质量,删去正负样本集中比较难把控的样本,并增加训练集数量。二是可以加深网络层数,确保网络能将残缺目标区域与完整目标区域准确区分开。
(3)最后网络输出多个检测区域,一定要保留下检测效果最佳的bounding box。可以考虑用Logistic Regression模型对CNN提取到的特征进行概率建模,或者尝试利用bounding box回归做细微调整。
Ques3:有个问题一直让我无法理解。对于目标检测,哪怕是当下最流行的YOLO算法也做不到100%的检测精度,那怎么可以实际场景下应用呢?比如无人车间车辆检测,一旦有某辆车检测失败,实际造成的损失无法估量。
工业上目标检测多用的是视频流处理,相当于采用了统计上的概率连乘策略。就算我模型的检测精度只有95%,视频流1秒有20帧,这20张图片中只要有1张检测到了目标物体,我就设置门打开,这时出故障的概率大大降低到1-0.05^20。
Ques4:自己这个RCNN人脸检测项目的不足之处。
(1)人脸检测处理效果无法达到实时性。目前大概是0.5s能检测出一张图片,无法直接对视频流进行处理。
(2)Selective Search候选区域生成、CNN网络二分类、平均策略选定检测区域,这三大步骤的效果只是大体上使我满意,但并没有达到完美的处理效果。
比如说有时候Selective Search算法输出的区域,根本就没有我想要的人脸区域。比如说有时候CNN网络居然把某些无关背景也分类为1。比如说有时某些检测区域过于离谱,反而拉低了平均策略的效果。
这时我终于理解为什么要推崇end-to-end的模型了,中间步骤太多,每一步效果稍微打点折扣,最后整体效果就稀里糊涂。
六、源码
Selective Search生成候选区域:
import cv2
import numpy as np
import skimage.segmentation
import random
import skimage.feature
# Selective Search algorithm
# step 1: calculate the first fel_segment region
# step 2: calculate the neighbour couple
# step 3: calculate the similarity dictionary
# step 4: merge regions and calculate the second merged region
# step 5: obtain the target candidate region through the second region
def intersect(a, b):
if (a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]) or \
(a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or \
(a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or \
(a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]):
return True
return False
def calc_similarity(r1, r2, size):
sim1 = 0
sim2 = 0
for a, b in zip(r1["hist_c"], r2["hist_c"]):
sim1 = sim1 + min(a, b)
for a, b in zip(r1["hist_t"], r2["hist_t"]):
sim2 = sim2 + min(a, b)
sim3 = 1.0 - (r1["size"] + r2["size"]) / size
rect_size = (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"])) * \
(max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
sim4 = 1.0 - (rect_size - r1["size"] - r2["size"]) / size
similarity = sim1 + sim2 + sim3 + sim4
return similarity
def merge_region(r1, r2, t):
new_size = r1["size"] + r2["size"]
r_new = {
"min_x": min(r1["min_x"], r2["min_x"]),
"min_y": min(r1["min_y"], r2["min_y"]),
"max_x": max(r1["max_x"], r2["max_x"]),
"max_y": max(r1["max_y"], r2["max_y"]),
"size": new_size,
"hist_c": (
r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
"hist_t": (
r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
"labels": t
}
return r_new
# Step 1: Calculate the different categories segmented by felzenszwalb algorithm
def first_calc_fel_category(image, scale, sigma, min_size):
fel_mask = skimage.segmentation.felzenszwalb(image, scale=scale, sigma=sigma, min_size=min_size)
print('The picture has been segmented in these categories : ', np.max(fel_mask)) # 0-694 categories
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # (250, 250)
texture_img = skimage.feature.local_binary_pattern(gray_image, 8, 1.0) # (250, 250)
# fel_img = np.zeros((fel_mask.shape[0], fel_mask.shape[0], 3))
# for i in range(np.max(fel_mask)):
# a = random.randint(0, 255)
# b = random.randint(0, 255)
# c = random.randint(0, 255)
# for j in range(fel_mask.shape[0]):
# for k in range(fel_mask.shape[1]):
# if fel_mask[j, k] == i:
# fel_img[j, k, 0] = a
# fel_img[j, k, 1] = b
# fel_img[j, k, 2] = c
#
# cv2.namedWindow("image")
# cv2.imshow('image', fel_img/255)
# cv2.waitKey(0)
# cv2.imwrite('felzenszwalb_img.jpg', fel_img)
img_append = np.zeros((fel_mask.shape[0], fel_mask.shape[1], 4)) # (250, 250, 4)
img_append[:, :, 0:3] = image
img_append[:, :, 3] = fel_mask
region = {}
# calc the min_x、in_y、max_x、max_y、label in every category
for y, i in enumerate(img_append):
for x, (r, g, b, l) in enumerate(i):
if l not in region:
region[l] = {"min_x": 0xffff, "min_y": 0xffff, "max_x": 0, "max_y": 0, "labels": l}
if region[l]["min_x"] > x:
region[l]["min_x"] = x
if region[l]["min_y"] > y:
region[l]["min_y"] = y
if region[l]["max_x"] < x:
region[l]["max_x"] = x
if region[l]["max_y"] < y:
region[l]["max_y"] = y
for k, v in list(region.items()):
# calc the size feature in every category
masked_color = image[:, :, :][img_append[:, :, 3] == k]
region[k]["size"] = len(masked_color)
# calc the color feature in every category
color_bin = 6
color_hist = np.array([])
for colour_channel in (0, 1, 2):
c = masked_color[:, colour_channel]
color_hist = np.concatenate([color_hist] + [np.histogram(c, color_bin, (0.0, 255.0))[0]])
color_hist = color_hist / sum(color_hist)
region[k]["hist_c"] = color_hist
# calc the texture feature in every category
texture_bin = 10
masked_texture = texture_img[:, :][img_append[:, :, 3] == k]
texture_hist = np.histogram(masked_texture, texture_bin, (0.0, 255.0))[0]
texture_hist = texture_hist / sum(texture_hist)
region[k]["hist_t"] = texture_hist
return region
# Step 2: Calculate the neighbour couple in the first fel_segment region
def calc_neighbour_couple(region):
r = list(region.items())
couples = []
for cur, a in enumerate(r[:-1]):
for b in r[cur + 1:]:
if intersect(a[1], b[1]):
couples.append((a, b))
return couples
# Step 3: Calculate the sim_dictionary in the neighbour couple
def calc_sim_dictionary(couple, total_size):
sim_dictionary = {}
for (ai, ar), (bi, br) in couple:
sim_dictionary[(ai, bi)] = calc_similarity(ar, br, total_size)
return sim_dictionary
# step 4: merge the small regions and calculate the second merged region
def second_calc_merge_category(sim_dictionary, region, total_size):
while sim_dictionary != {}:
i, j = sorted(sim_dictionary.items(), key=lambda i: i[1])[-1][0]
t = max(region.keys()) + 1.0
region[t] = merge_region(region[i], region[j], t)
key_to_delete = []
for k, v in list(sim_dictionary.items()):
if (i in k) or (j in k):
key_to_delete.append(k)
for k in key_to_delete:
del sim_dictionary[k]
for k in [a for a in key_to_delete if a != (i, j)]:
n = k[1] if k[0] in (i, j) else k[0]
sim_dictionary[(t, n)] = calc_similarity(region[t], region[n], total_size)
return region
# step 5: obtain the target candidate regions through the second region
def calc_candidate_box(second_region, total_size):
category = []
for k, r in list(second_region.items()):
category.append({'rect': (r['min_x'], r['min_y'], r['max_x'], r['max_y']), 'size': r['size']})
candidate_box = set()
for r in category:
if r['rect'] in candidate_box:
continue
x1, y1, x2, y2 = r['rect']
if (x2-x1)*(y2-y1) > total_size / 3:
continue
if (x2-x1)*(y2-y1) < total_size / 16:
continue
if (x2-x1) == 0 or (y2-y1) == 0:
continue
if (y2-y1) / (x2-x1) > 1.5 or (x2-x1) / (y2-y1) > 1.5:
continue
candidate_box.add(r['rect'])
return candidate_box
制作训练数据集:
import cv2
import numpy as np
import random
import os
# face_image: [1, 0]
# background_image: [0, 1]
def read_data():
# 3674 + 4440 : train data from selective search algorithm
# 4000 + 3000 : train data from online download dataset
n = 15114
data_x = np.zeros((n, 96, 96))
data_y = np.zeros((n, 2))
filename = os.listdir("/home/archer/CODE/PF/selective_train_face")
filename.sort()
i = 0
for name in filename:
face_image = cv2.imread("/home/archer/CODE/PF/selective_train_face/" + name)
face_gray_image = cv2.cvtColor(face_image, cv2.COLOR_BGR2GRAY)
face_resize_image = cv2.resize(face_gray_image, (96, 96), interpolation=cv2.INTER_AREA)
data_x[i, :, :] = face_resize_image / 255
data_y[i, :] = np.array([1, 0])
i = i + 1
print('the selective_train_face has been download : ', i)
for k in range(4000):
face_image = cv2.imread("/home/archer/CODE/PF/download_train_face/" + str(k+1) + '.jpg')
face_gray_image = cv2.cvtColor(face_image, cv2.COLOR_BGR2GRAY)
face_resize_image = cv2.resize(face_gray_image, (96, 96), interpolation=cv2.INTER_AREA)
data_x[i, :, :] = face_resize_image / 255
data_y[i, :] = np.array([1, 0])
i = i + 1
print('the download_train_face has been download : ', i)
filename = os.listdir("/home/archer/CODE/PF/selective_train_background")
filename.sort()
for name in filename:
background_image = cv2.imread("/home/archer/CODE/PF/selective_train_background/" + name)
background_gray_image = cv2.cvtColor(background_image, cv2.COLOR_BGR2GRAY)
background_resize_image = cv2.resize(background_gray_image, (96, 96), interpolation=cv2.INTER_AREA)
data_x[i, :, :] = background_resize_image / 255
data_y[i, :] = np.array([0, 1])
i = i + 1
print('the selective_train_background has been download : ', i)
for k in range(3000):
background_image = cv2.imread("/home/archer/CODE/PF/download_train_background/" + str(k+1) + '.jpg')
background_gray_image = cv2.cvtColor(background_image, cv2.COLOR_BGR2GRAY)
background_resize_image = cv2.resize(background_gray_image, (96, 96), interpolation=cv2.INTER_AREA)
data_x[i, :, :] = background_resize_image / 255
data_y[i, :] = np.array([0, 1])
i = i + 1
print('the download_train_background has been download : ', i)
return data_x, data_y
def make_train_data():
# train number : n1 , test number : n2
n1 = 12000
n2 = 3114
n = n1 + n2
data_x, data_y = read_data()
random_index = np.arange(0, n, 1)
random.shuffle(random_index)
train_x = np.zeros((n1, 96, 96))
train_y = np.zeros((n1, 2))
test_x = np.zeros((n2, 96, 96))
test_y = np.zeros((n2, 2))
for i in range(n1):
index = random_index[i]
train_x[i, :, :] = data_x[index, :, :]
train_y[i, :] = data_y[index, :]
for i in range(n2):
index = random_index[n1 + i]
test_x[i, :, :] = data_x[index, :, :]
test_y[i, :] = data_y[index, :]
return train_x, train_y, test_x, test_y
网络结构:
import numpy as np
from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.models import load_model
import matplotlib.pyplot as plt
def create_network():
np.random.seed(1)
model = Sequential()
model.add(Conv2D(16, (5, 5), input_shape=(96, 96, 1), strides=1))
model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(Activation('relu'))
model.add(Conv2D(32, (5, 5), input_shape=(96, 96, 1), strides=1))
model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3), strides=1))
model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(Activation('relu'))
model.add(Conv2D(128, (3, 3), strides=1))
model.add(Activation('relu'))
model.add(Conv2D(256, (3, 3), strides=1))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))
model.summary()
return model
# batch generator: reduce the consumption of computer memory
def generator(train_x, train_y, batch_size):
while 1:
row = np.random.randint(0, len(train_x), size=batch_size)
x = train_x[row]
y = train_y[row]
yield x, y
# create model and train and save
def train_network(train_x, train_y, test_x, test_y, epoch, batch_size):
train_x = train_x[:, :, :, np.newaxis]
test_x = test_x[:, :, :, np.newaxis]
model = create_network()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit_generator(generator(train_x, train_y, batch_size), epochs=epoch,
steps_per_epoch=len(train_x) // batch_size)
model.save('first_model.h5')
calculate_test_accuracy(test_x, test_y, 'first_model.h5')
# Load the partially trained model and continue training and save
def load_network_then_train(train_x, train_y, test_x, test_y, epoch, batch_size, input_name, output_name):
train_x = train_x[:, :, :, np.newaxis]
test_x = test_x[:, :, :, np.newaxis]
model = load_model(input_name)
history = model.fit_generator(generator(train_x, train_y, batch_size),
epochs=epoch, steps_per_epoch=len(train_x) // batch_size)
model.save(output_name)
calculate_test_accuracy(test_x, test_y, output_name)
# plot the loss and the accuracy
def show_plot(history):
# list all data in history
print(history.history.keys())
plt.plot(history.history['loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.savefig('loss1.jpg')
plt.show()
plt.plot(history.history['accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.savefig('accuracy1.jpg')
plt.show()
# calculate the accuracy in test set
def calculate_test_accuracy(test_x, test_y, output_name):
model = load_model(output_name)
test_result = model.predict(test_x)
accuracy_number = 0
for i in range(len(test_x)):
if np.argmax(test_result[i, :]) == 0 and test_y[i, 0] == 1:
accuracy_number = accuracy_number + 1
if np.argmax(test_result[i, :]) == 1 and test_y[i, 0] == 0:
accuracy_number = accuracy_number + 1
print('The accuracy in test set is :')
print(accuracy_number/len(test_x))
main函数调用:
import get_data as gt
import network as nt
import numpy as np
from keras.models import load_model
import os
import cv2
import selective_search as ss
if __name__ == "__main__":
train_x, train_y, test_x, test_y = gt.make_train_data()
# nt.train_network(train_x, train_y, test_x, test_y, epoch=1, batch_size=32)
model = load_model('best_model.h5')
nt.calculate_test_accuracy(test_x[:, :, :, np.newaxis], test_y, 'best_model.h5')
filename = os.listdir("/home/archer/CODE/PF/demo1")
filename.sort()
for name in filename:
img = cv2.imread("/home/archer/CODE/PF/demo1/" + name)
img = np.float32(img)
total_size = img.shape[0] * img.shape[1]
first_region = ss.first_calc_fel_category(img, scale=10, sigma=0.9, min_size=100)
neighbour_couple = ss.calc_neighbour_couple(first_region)
sim_dictionary = ss.calc_sim_dictionary(neighbour_couple, total_size)
second_region = ss.second_calc_merge_category(sim_dictionary, first_region, total_size)
candidate_box = ss.calc_candidate_box(second_region, total_size)
a1, b1, a2, b2 = [0, 0, 0, 0]
# record the well detected region
bounding_box = []
for (x1, y1, x2, y2) in candidate_box:
select_img1 = img[y1:y2, x1:x2]
select_img2 = cv2.cvtColor(select_img1, cv2.COLOR_BGR2GRAY)
select_img3 = cv2.resize(select_img2, (96, 96), interpolation=cv2.INTER_AREA)
select_img4 = select_img3[np.newaxis, :, :, np.newaxis]
pre = model.predict(select_img4)
if pre[0, 0] > 0.8:
bounding_box.append([x1, y1, x2, y2])
# calculate the average, and ensure the stability
if len(bounding_box) > 0:
bounding_box = np.array(bounding_box)
print(bounding_box)
a1 = int(np.mean(bounding_box[:, 0]))
b1 = int(np.mean(bounding_box[:, 1]))
a2 = int(np.mean(bounding_box[:, 2]))
b2 = int(np.mean(bounding_box[:, 3]))
print(a1, b1, a2, b2, '***************************')
cv2.rectangle(img, (a1, b1), (a2, b2), (0, 0, 255), 2)
cv2.imwrite("/home/archer/CODE/PF/demo_detection/" + name, img)
七、项目链接
如果代码跑不通,或者想直接使用训练好的模型,可以去项目链接下载:
https://blog.csdn.net/Twilight737