1、将自己的数据集(视频需转为一帧一帧的图片,转换工具Convert to JPG - Convert images, documents and videos to JPG (img2go.com))用labelImage打标签,标记需要追踪的部分,标记完成后生成每张图片对应的xml文件,即voc格式
2、根据xml文件(voc格式)生成整个数据集的gt.txt文件,gt的数据格式:
<frame>,<id>,<bb_left>,<bb_top>,<bb_width>,<bb_height>,<cos>
其中,<frame>表示目标出现在哪一帧,<id> 表示目标所属的tracklet ID。接下来的四个值表示目标边界框在二维帧坐标中的位置,由左上角坐标及边界框的宽度和高度表示。<cos> 表示目标的完整性是否需要被考虑(1)或忽略(0),本数据集默认所有已标注目标均需考虑,均为1。
代码如下:
import os
import xml.etree.ElementTree as ET
import sys
if __name__ == "__main__":
xmls_path = "./data/rocket1/xml_labels/rocket3-2"
target_path = "./data/rocket1/xml_labels/"
f = open(target_path + "/" + 'rocket3-2gt' + ".txt", 'w')
i = 0
path_list = os.listdir(xmls_path)
path_list.sort(key=lambda x: (int(x[10:-4])))
for xmlFilePath in path_list:
print(os.path.join(xmls_path, xmlFilePath))
try:
tree = ET.parse(os.path.join(xmls_path, xmlFilePath))
# 获得根节点
root = tree.getroot()
except Exception as e: # 捕获除与程序退出sys.exit()相关之外的所有异常
print("parse test.xml fail!")
sys.exit()
i += 1
for object in root.iter('object'):
name = object.find('name')
if name.text == 'Top':
item = 1
elif name.text == 'flag':
item = 2
elif name.text == 'bottom':
item = 3
bndbox = object.find('bndbox')
#for bndbox in root.iter('bndbox'):
node = []
for child in bndbox:
node.append(int(child.text))
xmin, ymin = node[0], node[1]
xmax, ymax = node[2], node[3]
width = xmax - xmin
height = ymax - ymin
#print(xmin, ymin, xmax, ymax, width, height)
#cat = str(1) + ',' + str(-1) + ',' + str(-1) + ',' + str(-1)
string = str(i) + ',' + str(item) + ',' + str(xmin) + ',' + str(ymax) + ',' + str(width) + ',' + str(
height) + ',' + str(1)
# print(string)
f.write(string + '\n')
f.close()
3、数据集目录结构如下:
src---data---rocket1---images---train---rocket2-1---gt---gt.txt(第2步生成gt.txt后移动到此)
---img1(存放数据集jpg文件)
---seqinfo.ini
seqinfo.ini内容如下,需根据自己的数据集修改name,seqLength,imWidth,imHeight
[Sequence] name=rocket2-1 imDir=img1 frameRate=30 seqLength=26 imWidth=1920 imHeight=1080 imExt=.jpg
4、运行gen_labels.py文件,根据gt.txt生成每张图片的标签,即生成训练所需的数据格式,
FairMOT训练所需的数据格式:
<class> <id> <x_center/img_width> <y_center/img_height> <w/img_width><h/img_height>
class :目标类别
id :目标id
x_center/img_width :归一化中心列坐标
y_center/img_height :归一化中心行坐标
w/img_width :归一化宽
h/img_height :归一化高
gen_labels.py代码如下
import os.path as osp
import os
import numpy as np
def mkdirs(d):
if not osp.exists(d):
os.makedirs(d)
seq_root = './data/rocket1/images/train'
label_root = './data/rocket1/labels_with_ids/train'
mkdirs(label_root)
seqs = ['rocket2-1', 'rocket3-1', 'rocket3-2']
tid_curr = 0
tid_last = -1
for seq in seqs:
seq_info = open(osp.join(seq_root, seq, 'seqinfo.ini')).read()
seq_width = int(seq_info[seq_info.find('imWidth=') + 8:seq_info.find('\nimHeight')])
seq_height = int(seq_info[seq_info.find('imHeight=') + 9:seq_info.find('\nimExt')])
gt_txt = osp.join(seq_root, seq, 'gt', 'gt.txt')
gt = np.loadtxt(gt_txt, dtype=np.float64, delimiter=',')
idx = np.lexsort(gt.T[:2, :])
gt = gt[idx, :]
seq_label_root = osp.join(label_root, seq, 'img1')
mkdirs(seq_label_root)
for fid, tid, x, y, w, h, mark, _, _, _ in gt:
if mark == 0:
continue
fid = int(fid)
tid = int(tid)
if not tid == tid_last:
tid_curr += 1
tid_last = tid
x += w / 2
y += h / 2
label_fpath = osp.join(seq_label_root, seq + '_' + '{:03d}.txt'.format(fid-1))
label_str = '0 {:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format(
tid_curr, x / seq_width, y / seq_height, w / seq_width, h / seq_height)
with open(label_fpath, 'a') as f:
f.write(label_str)
5、执行gen_labels.py后在src/data/rocket1/labels_with_ids/train/rocket2-1目录下生成img1文件夹,img1中是对应每张图片的txt标签文件
6、训练
python train.py mot --exp_id rocket1 --gpus 0 --batch_size 6 --load_model ''
7、用训练生成的model_last.pth进行测试,可直接输入视频文件测试
python demo.py mot --load_model ../models/model_last.pth --conf_thres 0.4