参考链接:
前言:
本项目是基于yolo v3已经训练好的coco数据集的模型进行行人检测的,然后将检测出来的结果传入到alexnet训练好的上衣颜色识别模型做识别。alexnet训练上衣颜色的数据集可以采用RAP数据集进行训练,后面会详细说rap数据集的坑。最终结果是:alexnet的val准确率大概在76%左右【这个感觉可以改成VGG16试一下,效果可能会更好】,每一帧做测试的时候需要400多秒.
requirement:
ubuntu14.04【ps:ubuntu 16.04更好,因为支持cuda9.0】
python3.6.1
keras2.1.2
tensorflow-gpu==1.4.1
一.步骤说明:
1)先按照keras版的yolov3的视频代码,框出行人的位置
2)训练上衣颜色识别的模型,原版的【参考链接2】是识别上衣颜色+上衣类别,我需要删掉上衣类别这个网络模型,然后进行训练
3)将第一步得到的行人的位置传入第二步训练好的模型中,然后返回到Yolo v3的视频检测中继续跑
二.对于RAP数据集的坑:
首先RAP数据集的label是mat文件,这就比较棘手了,还好github上有转化为xml格式的代码,直接拿来用
也可以直接用我写的,统计了颜色种类+每种上衣颜色的数量
1)RAP数据集的上衣颜色的种类:
# -*- coding:utf-8 -*-
__author__ = 'xuy'
# data=scipy.io.loadmat('RAP/RAP_annotation/RAP_annotation.mat')
##this script extract relevant data from .mat file in RAP dataset
import scipy.io
import numpy as np
# import csv
# import datetime
# from datetime import datetime
from datetime import timedelta
# import os
import pandas as pd
shirt_color={}
def loadmat_and_extract(file, root_dir):
##load the .mat file
# mat = scipy.io.loadmat('./RAP_annotation/RAP_annotation.mat') #we have desired objects in mat now
mat = scipy.io.loadmat(file)
##there are key value pairs in mat of which we want wiki key and its values
# There are totally 7 varibals in RAP_annotation, including imagesname, position, label, partion, attribute_chinese, attribute_eng, attribute_exp.
print(mat.keys())
data = mat['RAP_annotation']
images = data['imagesname']
labels = data['label']
eng_attr = data['attribute_eng']
pos = data['position']
## Extracting required labels only
# 0 -> Gender Pr1
# 1-3 -> Age
# 15-23 -> Upper Body
# 24-29 -> Lower Body
# 35-42 -> attachments/accessories
# 51-54 -> face direction
# 55-62 -> occlusion
# 63-74 -> upper color
# 75-82 -> lower color
## putting wordy attributes in place of 1's in labels
req_labels = labels[0][0].astype(str)
for imgnum in range(0, len(req_labels)):
for lblnum in range(0, len(req_labels[imgnum])):
if req_labels[imgnum][lblnum] == '1':
req_labels[imgnum][lblnum] = eng_attr[0][0][lblnum][0][0]
# for now taking gender, upper body, lower body, face direction, upper color and lower colr
# req_labels2 = np.ndarray((41585,1))
# from set import Set
req_labels2 = []
lbl_idx = [0] + list(range(15, 23 + 1)) + list(range(24, 29 + 1)) + list(range(51, 54 + 1)) + list(
range(63, 74 + 1)) + list(range(75, 82 + 1))
for imgnum in range(0, len(req_labels)):
temp_lbl = []
for i in range(0, 92):
if i == 0 and req_labels[imgnum][i] == '0':
temp_lbl.append("Male")
elif i == 0 and req_labels[imgnum][i] == '2':
temp_lbl.append("Unknown")
elif i in lbl_idx:
temp_lbl.append(req_labels[imgnum][i])
req_labels2.append(np.asarray(temp_lbl).reshape(-1, 1))
# req_labels2 = np.asarray(req_labels2)
img_names = []
for i in range(0, len(images[0][0])):
renamed = str(images[0][0][i][0][0][:-4]).replace('-', '_')
img_names.append(renamed)
# img_names[0][:-4]
##finding size of images
import cv2
# root_dir = "./RAP_dataset/"
print("extracting images from root dir %s to get image sizes" % root_dir)
width = []
height = []
for l in range(0, len(img_names)):
# print(img_names[l])
file_loc = root_dir + str(img_names[l] + ".png")
print(file_loc)
# print(file_loc)
img = cv2.imread(file_loc, 0)
height.append(img.shape[0])
width.append(img.shape[1])
## Finding top right, topleft, bottomright, bottomleft
## fb = fullbody, hs = head-shoulder, ub = upperbody, lb = lowerbody
bbox = list(pos[0][0])
fb_xmin = []
fb_ymin = []
fb_xmax = []
fb_ymax = []
hs_xmin = []
hs_ymin = []
hs_xmax = []
hs_ymax = []
ub_xmin = []
ub_ymin = []
ub_xmax = []
ub_ymax = []
lb_xmin = []
lb_ymin = []
lb_xmax = []
lb_ymax = []
for i in range(0, len(bbox)):
fb_xmin.append(bbox[i][0])
fb_ymin.append(bbox[i][1])
fb_xmax.append(bbox[i][2] + bbox[i][0])
fb_ymax.append(bbox[i][3] + bbox[i][1])
hs_xmin.append(bbox[i][4])
hs_ymin.append(bbox[i][5])
hs_xmax.append(bbox[i][6] + bbox[i][4])
hs_ymax.append(bbox[i][7] + bbox[i][5])
ub_xmin.append(bbox[i][8])
ub_ymin.append(bbox[i][9])
ub_xmax.append(bbox[i][10] + bbox[i][8])
ub_ymax.append(bbox[i][11] + bbox[i][9])
lb_xmin.append(bbox[i][12])
lb_ymin.append(bbox[i][13])
lb_xmax.append(bbox[i][14] + bbox[i][12])
lb_ymax.append(bbox[i][15] + bbox[i][13])
## Saving attribute list
attr = []
for i in lbl_idx:
attr.append(eng_attr[0][0][i][0][0])
data3 = {'labels': attr}
df3 = pd.DataFrame(data=data3, index=lbl_idx)
df3.to_csv("attributes.csv")
## Putting all data in dataframe
data2 = {'images': img_names, 'labels': req_labels2, 'width': width, 'height': height,
'fb_xmin': fb_xmin, 'fb_xmax': fb_xmax, 'fb_ymin': fb_ymin, 'fb_ymax': fb_ymax,
'ub_xmin': ub_xmin, 'ub_xmax': ub_xmax, 'ub_ymin': ub_ymin, 'ub_ymax': ub_ymax,
'hs_xmin': hs_xmin, 'hs_xmax': hs_xmax, 'hs_ymin': hs_ymin, 'hs_ymax': hs_ymax,
'lb_xmin': lb_xmin, 'lb_xmax': lb_xmax, 'lb_ymin': lb_ymin, 'lb_ymax': lb_ymax}
df = pd.DataFrame(data=data2)
return df
def annotate(df):
# df = pd.read_csv(csvfile)
for row in df.itertuples():#将mat文件转化为xml文件
xmlData = open("annotations/" + str(row.images) + ".xml", 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
xmlData.write('<annotation>' + "\n")
xmlData.write(' ' + '<folder>RAP_dataset/</folder>' + "\n")
xmlData.write(' ' + '<filename>' \
+ str(str(row.images) + '.png') + '</filename>' + "\n")
xmlData.write(' ' + '<size>' + "\n")
xmlData.write(' ' + '<width>' \
+ str(row.width) + '</width>' + "\n")
xmlData.write(' ' + '<height>' \
+ str(row.height) + '</height>' + "\n")
xmlData.write(' ' + '<depth>3</depth>' + "\n")
xmlData.write(' ' + '</size>' + "\n")
for i in range(0, len(row.labels)):
# if row.labels[i] != "0" or row.labels[i] == "['2']":
# if row.labels[i] != "0" or row.labels[i] != "2":
ext_lbl = str(row.labels[i]).replace("[", "").replace("]", "").replace("'", "")
if ext_lbl != "0" or ext_lbl == "2":
xmlData.write(' ' + '<object>' + "\n")
xmlData.write(' ' + '<name>' \
+ str(ext_lbl) + '</name>' + "\n")
xmlData.write(' ' + '<pose>Unknown</pose>' + "\n")
xmlData.write(' ' + '<truncated>0</truncated>' + "\n")
xmlData.write(' ' + '<difficult>0</difficult>' + "\n")
if row.labels[i][0][:2] == 'Ma' or row.labels[i][0][:2] == 'Fe':
xmlData.write(' ' + '<bndbox>' + "\n")
xmlData.write(' ' + '<xmin>' \
+ str(row.fb_xmin) + '</xmin>' + "\n")
xmlData.write(' ' + '<ymin>' \
+ str(row.fb_ymin) + '</ymin>' + "\n")
xmlData.write(' ' + '<xmax>' \
+ str(row.fb_xmax) + '</xmax>' + "\n")
xmlData.write(' ' + '<ymax>' \
+ str(row.fb_ymax) + '</ymax>' + "\n")
xmlData.write(' ' + '</bndbox>' + "\n")
if row.labels[i][0][:2] == 'up' or row.labels[i][0][:2] == 'ub':
if row.labels[i][0][:2] == 'up':
color=row.labels[i][0][3:]
shirt_color[color]=shirt_color.get(color,0)+1
xmlData.write(' ' + '<bndbox>' + "\n")
xmlData.write(' ' + '<xmin>' \
+ str(row.ub_xmin) + '</xmin>' + "\n")
xmlData.write(' ' + '<ymin>' \
+ str(row.ub_ymin) + '</ymin>' + "\n")
xmlData.write(' ' + '<xmax>' \
+ str(row.ub_xmax) + '</xmax>' + "\n")
xmlData.write(' ' + '<ymax>' \
+ str(row.ub_ymax) + '</ymax>' + "\n")
xmlData.write(' ' + '</bndbox>' + "\n")
if row.labels[i][0][:3] == 'low' or row.labels[i][0][:2] == 'lb':
xmlData.write(' ' + '<bndbox>' + "\n")
xmlData.write(' ' + '<xmin>' \
+ str(row.lb_xmin) + '</xmin>' + "\n")
xmlData.write(' ' + '<ymin>' \
+ str(row.lb_ymin) + '</ymin>' + "\n")
xmlData.write(' ' + '<xmax>' \
+ str(row.lb_xmax) + '</xmax>' + "\n")
xmlData.write(' ' + '<ymax>' \
+ str(row.lb_ymax) + '</ymax>' + "\n")
xmlData.write(' ' + '</bndbox>' + "\n")
if row.labels[i][0][:2] == 'fa':
xmlData.write(' ' + '<bndbox>' + "\n")
xmlData.write(' ' + '<xmin>' \
+ str(row.hs_xmin) + '</xmin>' + "\n")
xmlData.write(' ' + '<ymin>' \
+ str(row.hs_ymin) + '</ymin>' + "\n")
xmlData.write(' ' + '<xmax>' \
+ str(row.hs_xmax) + '</xmax>' + "\n")
xmlData.write(' ' + '<ymax>' \
+ str(row.hs_ymax) + '</ymax>' + "\n")
xmlData.write(' ' + '</bndbox>' + "\n")
xmlData.write(' ' + '</object>' + "\n")
xmlData.write('</annotation>' + "\n")
xmlData.close()
file = 'RAP/RAP_annotation/RAP_annotation.mat'
root_dir = 'RAP/RAP_dataset/'
RAP_anno = loadmat_and_extract(file, root_dir)
# RAP_anno.to_csv('RAP_attributes.csv')
annotate(RAP_anno)
for word in shirt_color:
print('{} {}'.format(word,(shirt_color[word])))
# CAM21_2014_02_26_20140226111426_20140226112822_tarid136_frame1728_line1.xml
我们发现一共有12种颜色:
White 7837
Red 4489
Black 21680
Mixture 7006
Green 1951
Gray 9311
Brown 1197
Yellow 1762
Blue 5713
Pink 1104
Purple 718
Orange 485
RAP数据集的上衣颜色并不是以数字命名的,而是up_[color]来进行命名,需要截断up后面的内容,还得注意异常的判断,这个也是其他项目当中需要注意的问题,毕竟并不是所有的数据集都有上衣属性,读了好几个返回None的文件,debug半天才发现
2)上衣的坐标位置
你会发现上衣的坐标位置很奇怪,图片就那么大,但是上衣的位置超过了图片的像素位置,因为上衣位置是相对于male或者female这个属性位置来的,需要做减法
eg:
<object>
<name>Male</name>
<pose>Unknown</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>639</xmin>
<ymin>276</ymin>
<xmax>701</xmax>
<ymax>417</ymax>
</bndbox>
</object>
<object>
=============================
<object>
<name>up-Black</name>
<pose>Unknown</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>646</xmin>
<ymin>292</ymin>
<xmax>698</xmax>
<ymax>357</ymax>
</bndbox>
</object>
<object>
此时的xmin在图片中的位置就是646-639+1。以此类推
3)RAP数据集的用途:
问了一下作者,这个数据集仅仅用来图片分类用,不能做行人检测【因为不是全景图】,所以文件名给的frame帧号是没有用的
三.一些经验教训:
1)在调用函数的时候一定要进行异常判断,防止为空的情况,毕竟数据集很大,有一些特殊情况,这些一定要处理,否则debug半天找不出来问题出现在哪里
2)PIL以及opencv的问题
python的PIL和cv2都是做图像处理的库,但是二者有一个明显的区别。所以需要转换
PIL的三通道的编码方式是:RGB
opencv的三通道的编码方式是:BGR
如果做项目拼接的时候遇到此类问题的话,注意转换
3)PIL库的roi以及opencv的roi的区别:
PIL库截取ROI用的是:
roi=img.crop((left,top,right,bottom))#注意:这里是两个括号
opencv使用roi:
roi=img[top:bottom,left:right]