![a9e32a948423089a3c510011280ae2cf.png](https://i-blog.csdnimg.cn/blog_migrate/36abefe38b38080955943d8e142030ca.jpeg)
前言
Faster R-CNN是深度学习Two-Stage目标检测算法的杰出代表,其蕴含的思想在如今许多网络中都得以体现。Faster R-CNN的理论解读可以看一下下面博客。
馨意:深度学习目标检测Faster R-CNN论文解读zhuanlan.zhihu.com![5d67d06f342a4ae6be96683d691268a5.png](https://i-blog.csdnimg.cn/blog_migrate/1e8c63dc10e236af80c37b60e9815ac7.jpeg)
Faster R-CNN代码
这里我们使用bubbliiiing大佬的代码:
bubbliiiing/faster-rcnn-kerasgithub.com![5788f9f8cf52d41b27bdadb4c1d6a631.png](https://i-blog.csdnimg.cn/blog_migrate/a298adc8a78c1b58fd49a6d3dbd8028f.png)
大佬的代码解读可以看大佬的B站视频和CSDN文字解读:
https://www.bilibili.com/video/BV1U7411T72r?p=11www.bilibili.com 睿智的目标检测18--Keras搭建Faster-RCNN目标检测平台_Bubbliiiing的学习小课堂-CSDN博客_https://blog.csdn.net/weixin_44791964/article/detablog.csdn.net![7de68380cd7aee75207945eb766e2ba3.png](https://i-blog.csdnimg.cn/blog_migrate/b7c7f249369698479464057d78786339.jpeg)
行人视频数据下载及预处理
官网下载
视频数据下载:bbenfold_headpose/Datasets/TownCentreXVID.avi
标注数据下载:bbenfold_headpose/Datasets/TownCentre-groundtruth.top
百度网盘
链接:https://pan.baidu.com/s/1P2OrgUuGYBqDmwAMEAqQbw
提取码:4ms9
数据说明
该数据集包含一个视频TownCentreXVID.avi和标签文件TownCentre-groundtruth.top。其中TownCentreXVID.avi一共5 min,每1 sec包含25帧图像(1920*1080),因此一共包含7500帧图像;TownCentre-groundtruth.top包含前4500帧图像中行人的位置信息,每一行信息组织格式如下:
- personNumber - 个人的唯一标识符
- frameNumber - 帧号(从0开始计数)
- headValid - 如果头部区域有效,则为1,否则为0
- bodyValid - 如果身体区域有效,则为1,否则为0
- headLeft,headTop,headRight,headBottom - 头部边框(以像素为单位)
- bodyLeft,bodyTop,bodyRight,bodyBottom - 身体边框(以像素为单位)
对于行人检测,我们主要需要上述标黑的数据。
预处理
原始数据集为视频,而我们的代码是对图像进行训练。在此之前,可利用opencv的VideoCapture从TownCentreXVID.avi中抽取用于训练的图像帧,将其尺寸减半,存储到JPEGImages和JPEGImages_test,参考代码:
import cv2
import os
# 利用opencv的VideoCapture从TownCentreXVID.avi中抽取用于训练的图像帧,存储到JPEGImages和JPEGImages_test
def video2im(video_name, train_path = 'JPEGImages', test_path = 'JPEGImages_test', factor = 2):
frame = 0
cap = cv2.VideoCapture(video_name)
length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print('Total Frame Count:', length )
while True:
check, img = cap.read()
if check:
if frame <= 4500:
path = train_path
else:
path = test_path
img = cv2.resize(img, (1920 // factor, 1080 // factor))
cv2.imwrite(os.path.join(path, str(frame) + ".jpg"), img)
frame += 1
print('Processed: ',frame)
else:
break
cap.release()
video2im("TownCentreXVID.avi")
根据大佬的训练代码,我们需要生成一个训练txt,txt格式为:
# 图像文件地址 左上角,左下角,右上角,右上角,类别 左上角,左下角,右上角,右上角,类别 ...
JPEGImages0.jpg 141,144,184,245,0 142,114,181,207,1 358,109,392,203,2 395,116,429,212,0 437,195,477,313,0 327,496,394,692,0 802,299,857,445,0 806,444,873,637,0 894,86,937,178,0 821,55,855,137,0 728,13,754,82,0 679,4,703,70,0 444,35,470,109,0
JPEGImages1.jpg 139,145,181,247,0 144,114,184,206,3
...
生成上txt的代码为:
def read_top(file):
with open(file, 'r') as fp:
lines = fp.readlines()
lastframe = -1
train_file = open('train.txt', 'w')
for line in lines:
line = line.strip().split(",")
frameNumber = int(line[1])
xmin = int(abs(float(line[-4])-2)/2)
ymin = int(abs(float(line[-3])-2)/2)
xmax = int(abs(float(line[-2])-2)/2)
ymax = int(abs(float(line[-1])-2)/2)
if lastframe != frameNumber:
#新的帧
if(lastframe != -1):
train_file.write('n')
train_file.write(r"JPEGImages%s.jpg"%(frameNumber))
else:
train_file.write(r"JPEGImages%s.jpg"%(frameNumber))
lastframe = frameNumber
else:
train_file.write(' %s,%s,%s,%s,0'%(xmin, ymin, xmax, ymax))
infos=read_top(r"TownCentre-groundtruth.top")
训练代码改动
因为我们类别只有一个行人person,所以我们修改NUM_CLASSES的值为2(行人+背景)。训练轮数EPOCH设置为20即可,100的时间太长了。注释路径annotation_path改为我们刚刚生成的txt路径。这样就可以愉快地进行训练了。
NUM_CLASSES = 2
EPOCH = 20
annotation_path = r"MyDataset_TownCentretrain.txt"
模型预测
我们需要修改frcnn.py中的_defaults中的model_path改为我们生成的loss最低h5模型和classes_path改为内容只有person的txt文件。
源代码predict.py是手动输入图片地址进行预测:
from frcnn import FRCNN
from PIL import Image
frcnn = FRCNN()
# 输入预测
while True:
img = input('Input image filename:')
try:
image = Image.open(img)
except:
print('Open Error! Try again!')
continue
else:
r_image = frcnn.detect_image(image)
r_image.show()
r_image.save('result1.jpg', quality=95)
frcnn.close_session()
我们修改的单文件预测代码为:
from frcnn import FRCNN
from PIL import Image
frcnn = FRCNN()
# 单文件预测
img = 'img/street.jpg'
image = Image.open(img)
r_image = frcnn.detect_image(image)
r_image.show()
r_image.save('result2.jpg', quality=95)
frcnn.close_session()
我们修改的多文件批量预测代码为:
from frcnn import FRCNN
from PIL import Image
import os
frcnn = FRCNN()
# 批量文件预测
ReadPath = r"JPEGImages_test"
SavePath = r"JPEGImages_result"
FileList = os.listdir(ReadPath)
for i in range(len(FileList)):
img = ReadPath + "//" + FileList[i]
image = Image.open(img)
r_image = frcnn.detect_image(image)
#r_image.show()
r_image.save(SavePath + "//" + FileList[i], quality = 95)
frcnn.close_session()
![903ac70272394a72c91c42bbd5a4bb16.png](https://i-blog.csdnimg.cn/blog_migrate/73ce1858380f26d768f0050414d475ad.jpeg)
![2e40d7295391980761a40ba68261fae5.png](https://i-blog.csdnimg.cn/blog_migrate/656e64409f563a55f9dc9efb99238d5e.jpeg)
我们将预测的图像结果转为视频格式:
import os
import cv2
import numpy as np
path = r"JPEGImages_result"
filelist = os.listdir(path)
fps = 25 #视频每秒25帧
size = (960,540) #需要转为视频的图片的尺寸
video = cv2.VideoWriter("result.avi", cv2.VideoWriter_fourcc('I', '4', '2', '0'), fps, size)
for item in filelist:
if item.endswith('.jpg'):
img_name = path + "" + item
print(img_name)
img = cv2.imread(img_name)
# 解决中文路径问题
if(img == None):
img = cv2.imdecode(np.fromfile(img_name, dtype = np.uint8),-1)
video.write(img)
video.release()
cv2.destroyAllWindows()
print('end')
参考
https://www.bilibili.com/video/BV1U7411T72r?p=11www.bilibili.com 睿智的目标检测18--Keras搭建Faster-RCNN目标检测平台_Bubbliiiing的学习小课堂-CSDN博客_https://blog.csdn.net/weixin_44791964/article/detablog.csdn.net![7de68380cd7aee75207945eb766e2ba3.png](https://i-blog.csdnimg.cn/blog_migrate/b7c7f249369698479464057d78786339.jpeg)