关于yolov5 v2.0训练时遇到的一些问题

wendk_w

于 2024-06-09 10:35:20 发布

阅读量890

点赞数 7

文章标签： YOLO

本文链接：https://blog.csdn.net/wendk_w/article/details/139558654

版权

1.UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 176: illegal multibyte sequence

这个错误是由于 yaml 模块在读取 origincar.yaml 文件时尝试使用默认的编码（例如 GBK），但文件包含了非法的多字节序列。为了解决这个问题，可以显式地指定文件编码为 utf-8。

解决方法就是将yaml文件中的注释都去掉，然后再运行就可以了

2.RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

错误信息具体如下：

Traceback (most recent call last):
  File "D:\pycharm\yolov5v2.0\train.py", line 469, in <module>
    train(hyp, tb_writer, opt, device)
  File "D:\pycharm\yolov5v2.0\train.py", line 291, in train
    loss, loss_items = compute_loss(pred, targets.to(device), model)  # scaled by batch_size
  File "D:\pycharm\yolov5v2.0\utils\utils.py", line 443, in compute_loss
    tcls, tbox, indices, anchors = build_targets(p, targets, model)  # targets
  File "D:\pycharm\yolov5v2.0\utils\utils.py", line 532, in build_targets
    a, t = at[j], t.repeat(na, 1, 1)[j]  # filter
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

出现这个错误是因为在将张量传递给 build_targets 函数时，某些张量在不同的设备（CPU或GPU）上。为了修复这个错误，你需要确保所有相关的张量都在同一个设备上。

解决方法：将build_targets函数做出如下修改：

关键修改点：

获取目标张量的设备：在函数开始处获取 targets 张量的设备 (device = targets.device)。
移动相关张量到相同设备：
- anchors 张量在每次循环中被移到 device。
- 在计算 gain 的时候，确保 torch.tensor 调用使用了 device=device 参数。
- at 张量在创建时被移到 device

def build_targets(p, targets, model):
    # Build targets for compute_loss(), input targets(image, class, x, y, w, h)
    device = targets.device
    det = model.module.model[-1] if isinstance(model, (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel)) else model.model[-1]  # Detect() module
    na, nt = det.na, targets.shape[0]  # number of anchors, targets
    tcls, tbox, indices, anch = [], [], [], []
    gain = torch.ones(6, device=device)  # normalized to gridspace gain
    off = torch.tensor([[1, 0], [0, 1], [-1, 0], [0, -1]], device=device).float()  # overlap offsets
    at = torch.arange(na, device=device).view(na, 1).repeat(1, nt)  # anchor tensor, same as .repeat_interleave(nt)

    g = 0.5  # offset
    style = 'rect4'
    for i in range(det.nl):
        anchors = det.anchors[i].to(device)  # Move anchors to the same device
        gain[2:] = torch.tensor(p[i].shape, device=device)[[3, 2, 3, 2]]  # xyxy gain

        # Match targets to anchors
        a, t, offsets = [], targets * gain, 0
        if nt:
            r = t[None, :, 4:6] / anchors[:, None]  # wh ratio
            j = torch.max(r, 1. / r).max(2)[0] < model.hyp['anchor_t']  # compare
            # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n) = wh_iou(anchors(3,2), gwh(n,2))
            a, t = at[j], t.repeat(na, 1, 1)[j]  # filter

            # overlaps
            gxy = t[:, 2:4]  # grid xy
            z = torch.zeros_like(gxy)
            if style == 'rect2':
                j, k = ((gxy % 1. < g) & (gxy > 1.)).T
                a, t = torch.cat((a, a[j], a[k]), 0), torch.cat((t, t[j], t[k]), 0)
                offsets = torch.cat((z, z[j] + off[0], z[k] + off[1]), 0) * g
            elif style == 'rect4':
                j, k = ((gxy % 1. < g) & (gxy > 1.)).T
                l, m = ((gxy % 1. > (1 - g)) & (gxy < (gain[[2, 3]] - 1.))).T
                a, t = torch.cat((a, a[j], a[k], a[l], a[m]), 0), torch.cat((t, t[j], t[k], t[l], t[m]), 0)
                offsets = torch.cat((z, z[j] + off[0], z[k] + off[1], z[l] + off[2], z[m] + off[3]), 0) * g

        # Define
        b, c = t[:, :2].long().T  # image, class
        gxy = t[:, 2:4]  # grid xy
        gwh = t[:, 4:6]  # grid wh
        gij = (gxy - offsets).long()
        gi, gj = gij.T  # grid xy indices

        # Append
        indices.append((b, a, gj, gi))  # image, anchor, grid indices
        tbox.append(torch.cat((gxy - gij, gwh), 1))  # box
        anch.append(anchors[a])  # anchors
        tcls.append(c)  # class

    return tcls, tbox, indices, anch

3.TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

报错信息具体如下：

Traceback (most recent call last):
  File "D:\pycharm\yolov5v2.0\train.py", line 469, in <module>
    train(hyp, tb_writer, opt, device)
  File "D:\pycharm\yolov5v2.0\train.py", line 340, in train
    results, maps, times = test.test(opt.data,
  File "D:\pycharm\yolov5v2.0\test.py", line 176, in test
    plot_images(img, output_to_target(output, width, height), paths, str(f), names)  # predictions
  File "D:\pycharm\yolov5v2.0\utils\utils.py", line 914, in output_to_target
    return np.array(targets)
  File "D:\Anaconda3\envs\yolov5\lib\site-packages\torch\_tensor.py", line 956, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

这个错误是由于尝试将一个在CUDA设备上的张量直接转换为NumPy数组引起的。你需要首先将张量从CUDA设备移动到CPU，然后再进行转换。可以通过调用.cpu()方法实现。

解决方法：修改 output_to_target 函数

def output_to_target(output, width, height):
    # Convert model output to target format [batch_id, class_id, x, y, w, h, conf]
    targets = []

    if isinstance(output, torch.Tensor):
        output = output.cpu().numpy()

    if isinstance(output, np.ndarray):
        output = [output]  # 将单个 NumPy 数组封装到列表中，以便统一处理

    for i, o in enumerate(output):
        if o is not None:
            if isinstance(o, torch.Tensor):
                o = o.cpu().numpy()  # 确保张量被转换为 NumPy 数组

            for pred in o:
                box = pred[:4]
                w = (box[2] - box[0]) / width
                h = (box[3] - box[1]) / height
                x = box[0] / width + w / 2
                y = box[1] / height + h / 2
                conf = pred[4]
                cls = int(pred[5])

                targets.append([i, cls, x, y, w, h, conf])

    return np.array(targets)

首先检查 output 是否是一个张量，如果是，将其转换为 NumPy 数组。将 output 封装到一个列表中（如果它是单个 NumPy 数组），以便统一处理。然后遍历 output 列表中的每个元素，并确保它们是 NumPy 数组。如果 output 列表中的某个元素是张量，将其转换为 NumPy 数组。最终将处理后的目标值添加到 targets 列表中并返回其 NumPy 数组。

检查gpu是否可用：

python -c "import torch; print(torch.cuda.is_available())"

返回True就是可用

可以将train.py中 --device', default=''中间添上0

parser.add_argument('--device', default='0', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')

wendk_w

关注

7
点赞
踩
15

收藏

觉得还不错? 一键收藏
1
评论
关于yolov5 v2.0训练时遇到的一些问题

这个错误是由于yaml模块在读取文件时尝试使用默认的编码（例如GBK），但文件包含了非法的多字节序列。为了解决这个问题，可以显式地指定文件编码为utf-8。解决方法就是将yaml文件中的注释都去掉，然后再运行就可以了。
复制链接

扫一扫