YOLOF结构较为简单,但是几乎没有看到基于tensorflow的实现,因此我自己实现了一版,受限于硬件资源(gtx1060),只在voc07上进行训练测试,map在0.66左右。话不多说,直接上链接,有需要的朋友自取。JiXuKong/YOLOF: Tensorflow implement YOLOF(You Only Look One Level Feaature) (github.com)
使用说明:
1.数据准备,目前仅支持xml格式的标签
- 首先,将标签和图像分别放到两个不同的文件夹,如下图所示:
- 然后生成包含标签名称的txt文件:
import os
import numpy as np
path = r''#标签文件路径
path1 = r'xx.txt'#保存txt的路径
name = []
for fil in os.listdir(path):
name.append(fil.split('.')[0])
with open(path1,'a') as f1:
for i in range(len(name)):
f1.write(name[i] + '\n')
f1.close()
- voc07测试集生成的txt文件示意:
2.配置config.py文件
- 训练数据路径
ckecpoint_file = './checkpoint'#保存训练文件的路径
train_img_txt = r'F:\open_dataset\VOCdevkit\0712train\07train.txt'#训练标签txt
train_img_path = r'F:\open_dataset\VOCdevkit\0712train\JPEGImages'#训练图像路径
train_label_path = r'F:\open_dataset\VOCdevkit\0712train\Annotations'#训练标签路径
with open(train_img_txt, 'r') as f:
image_index = [x.strip() for x in f.readlines()]
train_num = len(image_index)
test_img_txt = r'F:\open_dataset\VOCdevkit\07test\test.txt'#测试集
test_img_path = r'F:\open_dataset\VOCdevkit\07test\JPEGImages'
test_label_path = r'F:\open_dataset\VOCdevkit\07test\Annotations'
cache_path = './pkl'#初次运行会读取标签并保存成pkl文件,不需重复读取,参见data/pascal_voc.py
- 权重路径
val_restore_path = './checkpoint/model.ckpt-35000'#测试时的路径
# train_restore_path = './checkpoint/model.ckpt-30000'
train_restore_path = './pretrained/resnet_v1_50_2016_08_28/resnet_v1_50.ckpt'#预训练权重,可在tensorflow官方model zoo里面下载
- 标签类别,包含背景标签
classes = ['background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',
'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant', 'sheep', 'sofa',
'train', 'tvmonitor']
- 数据增强
random_crop = True
other_aug = True
multiscale = False#暂不支持
- 一些超参数
#some network parameters
weight_decay = 0.0001
gradient_clip_by_norm = 10
batch_size = 6#测试时改为1
max_epoch = 1
with open(test_img_txt, 'r') as f:
image_index = [x.strip() for x in f.readlines()]
test_num = len(image_index)
class_num = len(classes)
image_size = 640
phase = True
# base_anchor = [32, 64, 128]
base_anchor = [16]
scale = np.array([2, 4, 8, 16, 32])
# scale = np.array([1])
aspect_ratio = np.array([1.0])#np.array([0.5, 1.0, 2.0])
anchors = scale.shape[0]*aspect_ratio.shape[0]
momentum_rate = 0.9
alpha = 0.25
gama = 2
class_weight = 1.0
regress_weight = 1.0
decay = 0.99
pi = 1e-2
- yolof参数
#yolof
LR = 1e-3#0.12/64*batch_size/7#0.01/(16/batch_size)
DECAY_STEP = [train_num//batch_size*40, train_num//batch_size*50]
score_threshold=0.005
nms_iou_threshold=0.5
encoder_channels=512
block_mid_channels=128
num_residual_blocks=4
block_dilations=[2, 4, 6, 8]
cls_num_convs=2
reg_num_convs=4
match_times=4
max_detection_boxes_num=100
giou_loss = False
3.训练
- 训练命令
python train.py
- 支持tensorboard
tensorboard --logdir='' --host=127.0.0.1
- 训练曲线可视化
- 检测结果可视化
4.eval.py
- 在config.py中将batch_size设置为1,并配置val_restore_path(训练得到的权重),运行命令:
python eval.py
- 验证结果
官方实现中将训练image设置的比较大(800~1333),我这里设置成了640,下面的结果是在voc07数据集上进行训练和验证的:
aeroplane ap: 0.6827470379199196
bicycle ap: 0.7396616588891274
bird ap: 0.683572632644542
boat ap: 0.447618926300844
bottle ap: 0.4345145595736113
bus ap: 0.7385106924225606
car ap: 0.7482553476390383
cat ap: 0.8446047234792089
chair ap: 0.4346667353657149
cow ap: 0.7053577375421849
diningtable ap: 0.5357025440712686
dog ap: 0.787412696249006
horse ap: 0.7761393593166481
motorbike ap: 0.7321121960518314
person ap: 0.7381098784497359
pottedplant ap: 0.4002145788146931
sheep ap: 0.6493722707980141
sofa ap: 0.5978992802139366
train ap: 0.7952630542662096
tvmonitor ap: 0.6918367797115978
Epoch: 45 mAP: 0.6581786344859846
Epoch: 45 mRecall: 0.9057230241905329
Epoch: 45 mPrecision: 0.0025352130003605832
5.demo.py
- 设置待测图像以及保存路径
if __name__ == '__main__':
img_path = './asset/000025.jpg'
saved_img ='./asset/'+'pred' + img_path.split('/')[-1]
- 检测结果