随手记,以后有时间再整理吧
问题一:环境配置后cuda与pytorch不符
detect: weights=yolov5s.pt, source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
requirements: opencv-python>=4.1.2 not found and is required by YOLOv5, attempting auto-update...
requirements: 'pip install opencv-python>=4.1.2' skipped (offline)
/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/cuda/__init__.py:143: UserWarning:
NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
YOLOv5 🚀 2021-11-11 torch 1.10.0+cu102 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB)
Downloading https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt to yolov5s.pt...
100%|██████████| 14.0M/14.0M [00:23<00:00, 613kB/s]
Traceback (most recent call last):
File "/home/milk/yolo/yolov5/detect.py", line 244, in <module>
main(opt)
File "/home/milk/yolo/yolov5/detect.py", line 239, in main
run(**vars(opt))
File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/milk/yolo/yolov5/detect.py", line 79, in run
model = DetectMultiBackend(weights, device=device, dnn=dnn)
File "/home/milk/yolo/yolov5/models/common.py", line 305, in __init__
model = torch.jit.load(w) if 'torchscript' in w else attempt_load(weights, map_location=device)
File "/home/milk/yolo/yolov5/models/experimental.py", line 98, in attempt_load
model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval()) # FP32 model
File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 735, in float
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
File "/home/milk/yolo/yolov5/models/yolo.py", line 240, in _apply
self = super()._apply(fn)
File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 593, in _apply
param_applied = fn(param)
File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 735, in <lambda>
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Process finished with exit code 1
解决方法一:
在该环境下重新安装torch,以下命令从pytorch官网下载----Start Locally | PyTorch
成功解决!
yolov5开始训练记录
/home/milk/anaconda3/envs/milk/bin/python3 /home/milk/yolo/yolov5/train.py
train: weights=yolov5s.pt, cfg=, data=data/Underwater.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=100, batch_size=16, imgsz=1280, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=0, save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5
YOLOv5 🚀 2021-11-11 torch 1.10.0+cu113 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB)
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)
Overriding model.yaml nc=80 with nc=4
from n params module arguments
0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 24273 models.yolo.Detect [4, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 270 layers, 7030417 parameters, 7030417 gradients
Transferred 343/349 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 57 weight, 60 weight (no decay), 60 bias
Traceback (most recent call last):
File "/home/milk/yolo/yolov5/utils/datasets.py", line 406, in __init__
raise Exception(f'{prefix}{p} does not exist')
Exception: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/milk/yolo/yolov5/train.py", line 625, in <module>
main(opt)
File "/home/milk/yolo/yolov5/train.py", line 522, in main
train(opt.hyp, opt, device, callbacks)
File "/home/milk/yolo/yolov5/train.py", line 212, in train
train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
File "/home/milk/yolo/yolov5/utils/datasets.py", line 98, in create_dataloader
dataset = LoadImagesAndLabels(path, imgsz, batch_size,
File "/home/milk/yolo/yolov5/utils/datasets.py", line 411, in __init__
raise Exception(f'{prefix}Error loading data from {path}: {e}\nSee {HELP_URL}')
Exception: train: Error loading data from ['/home/milk/yolo/yolov5/data/tain.txt']: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
Process finished with exit code 1
修改yolov5s.yaml 中的nc=4
/home/milk/anaconda3/envs/milk/bin/python3 /home/milk/yolo/yolov5/train.py
train: weights=yolov5s.pt, cfg=, data=data/Underwater.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=100, batch_size=16, imgsz=1280, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=0, save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5
YOLOv5 🚀 2021-11-11 torch 1.10.0+cu113 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB)
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)
Overriding model.yaml nc=80 with nc=4
from n params module arguments
0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 24273 models.yolo.Detect [4, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 270 layers, 7030417 parameters, 7030417 gradients
Transferred 343/349 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 57 weight, 60 weight (no decay), 60 bias
Traceback (most recent call last):
File "/home/milk/yolo/yolov5/utils/datasets.py", line 406, in __init__
raise Exception(f'{prefix}{p} does not exist')
Exception: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/milk/yolo/yolov5/train.py", line 625, in <module>
main(opt)
File "/home/milk/yolo/yolov5/train.py", line 522, in main
train(opt.hyp, opt, device, callbacks)
File "/home/milk/yolo/yolov5/train.py", line 212, in train
train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
File "/home/milk/yolo/yolov5/utils/datasets.py", line 98, in create_dataloader
dataset = LoadImagesAndLabels(path, imgsz, batch_size,
File "/home/milk/yolo/yolov5/utils/datasets.py", line 411, in __init__
raise Exception(f'{prefix}Error loading data from {path}: {e}\nSee {HELP_URL}')
Exception: train: Error loading data from ['/home/milk/yolo/yolov5/data/tain.txt']: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
Process finished with exit code 1
训练出错,修改
parser.add_argument('--cfg', type=str, default=ROOT / 'models/yolov5s.yaml', help='model.yaml path')
详细检查.yaml文件,发现路径打错一个字母,训练开始,预估10h
还有一种情况需要注意 YOLOv5中的datasets文件中使用的数据集文件名是images 如果你的文件名不是这个,训练时会出现找不到标签错误
解决:
修改 utils 下的 datasets.py 文件中下列位置的 ‘Images’ 处为你存放图片文件夹名称,我得是Images
def img2label_paths(img_paths):
# Define label paths as a function of image paths
sa, sb = os.sep + 'Images' + os.sep, os.sep + 'labels' + os.sep # /images/, /labels/ substrings
return [sb.join(x.rsplit(sa, 1)).rsplit('.', 1)[0] + '.txt' for x in img_paths]