数据集构建(和YOLOv5格式一致):
每一个image和label文件的存放满足如下的关系
```
../coco/images/train2017/000000109622.jpg # image
../coco/labels/train2017/000000109622.txt # label
```
要额外生成三个txt文件:train.txt,val.txt,test.txt
这是val.txt
预训练模型的下载:
baidu链接:https://pan.baidu.com/s/1nyQlH-GHrmddCEkuv-VmAg
提取码:78bg
代码:
该版本的复现者是YOLOv4的二作:**Chien-Yao Wang**,他也是CSPNet的一作。再值得说的是YOLOv4 和 YOLOv5都用到了CSPNet。 这个PyTorch版本的YOLOv4是基于 ultralytic的YOLOv3基础上实现的。ultralytic 复现的YOLOv3 应该最强的YOLOv3 PyTorch复现:https://github.com/ultralytics/yolov3。我们将使用该本本的YOLO v4训练自己的数据集,并提供详细的代码修改和训练,测试的整个过程。
遇到问题
发生异常: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/site-packages/thop/vision/basic_hooks.py", line 69, in count_normalization
m.total_ops += flops
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl
hook_result = hook(self, input, result)
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/user-lbyjh/Pytorch_YOLO-v4-master/models.py", line 298, in forward_once
x = module(x)
File "/home/user-lbyjh/Pytorch_YOLO-v4-master/models.py", line 244, in forward
return self.forward_once(x)
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user-lbyjh/Pytorch_YOLO-v4-master/train.py", line 261, in train
pred = model(imgs)
File "/home/user-lbyjh/Pytorch_YOLO-v4-master/train.py", line 409, in <module>
train() # train normally
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
这是由于train.py中只能选择多卡或者CPU没有单卡的情况。将代码改成:
if device.type != 'cpu' and torch.cuda.device_count() > 1 and torch.distributed.is_available():
dist.init_process_group(backend='nccl', # 'distributed backend'
init_method='tcp://127.0.0.1:9999', # distributed training init method
world_size=1, # number of nodes for distributed training
rank=0) # distributed training node rank
model = torch.nn.parallel.DistributedDataParallel(model, find_unused_parameters=True)
model.yolo_layers = model.module.yolo_layers # move yolo layer indices to top level
else:
model = model.to(device) # 将模型移动到当前设备
model.yolo_layers = model.yolo_layers # move yolo layer indices to top level
发生异常: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
shape '[8, 3, 10, 20, 20]' is invalid for input of size 816000
File "/home/user-lbyjh/Pytorch_YOLO-v4-master/models.py", line 197, in forward
p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous() # prediction
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user-lbyjh/Pytorch_YOLO-v4-master/models.py", line 296, in forward_once
yolo_out.append(module(x, out))
File "/home/user-lbyjh/Pytorch_YOLO-v4-master/models.py", line 244, in forward
return self.forward_once(x)
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user-lbyjh/Pytorch_YOLO-v4-master/train.py", line 268, in train
pred = model(imgs)
File "/home/user-lbyjh/Pytorch_YOLO-v4-master/train.py", line 416, in <module>
train() # train normally
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user-lbyjh/anaconda3/envs/yjhtorch16/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
RuntimeError: shape '[8, 3, 10, 20, 20]' is invalid for input of size 816000
修改.cfg文件中的配置将每个[yolo]前一个卷积中定义的filters改成和自己数据集匹配即可。
[convolutional]
size=1
stride=1
pad=1
# filters=(5+classes)*3
filters=30
activation=linear