Jetson/PyTorch - 005 Faster R-CNN Object Detection

lida2003

已于 2024-05-16 19:05:05 修改

阅读量721

点赞数 9

分类专栏： Linux 文章标签： pytorch cuda

于 2024-05-16 19:00:16 首次发布

本文链接：https://blog.csdn.net/lida2003/article/details/138941677

版权

Linux 专栏收录该内容

196 篇文章 13 订阅

订阅专栏

1. 源由

在《Colab/PyTorch - 004 Torchvision Semantic Segmentation》已经“点名”关于分类和目标检测的问题：

分类问题在图像应用上细节具体，通常图像尺寸比较到200+像素
目标检测在实际应用中通常使用到的像素比较少，尤其在移动、嵌入式设备甚至只有16x16像素

换句话说，随着范围的扩大，解决问题的难度随之呈现非线性关系增大。
而实际情况往往就是这样：

首先，我们根本就不知道分类的目标在哪里
然后，通过目标检测算法找到目标，并从环境中分离出来
最后，对分离出来的目标进行分类

这里就会针对这个更进一层的问题进行研究和讨论。
在这里插入图片描述

2. 目标检测

简单来说，目标检测是一个两步骤的过程：

找到包含对象的边界框，确保每个边界框只有一个对象。
对每个边界框中的图像进行分类，并为其分配一个标签。

下面介绍 Faster R-CNN 目标检测架构发展的步骤：

2.1 滑动窗口方法

大多数经典的目标检测计算机视觉技术，如HAAR级联和HOG + SVM，都使用滑动窗口方法来检测对象。

在这种方法中，一个滑动窗口在图像上移动。窗口内的所有像素被裁剪出来并发送到图像分类器。

如果图像分类器识别出一个已知的对象，边界框和类标签就会被存储。否则，下一个窗口会被评估。

滑动窗口方法在计算上非常昂贵。为了检测输入图像中的对象，需要在图像的每个像素处评估不同比例和长宽比的滑动窗口。

由于计算成本的原因，滑动窗口仅在检测单个对象类别且具有固定长宽比时使用。例如，OpenCV中基于HOG + SVM或HAAR的人脸检测器使用滑动窗口方法。值得注意的是，著名的Viola Jones人脸检测器也使用了滑动窗口。在人脸检测器的情况下，复杂度是可管理的，因为只评估了不同比例的正方形边界框。

2.2 R-CNN目标检测

基于卷积神经网络（CNN）的图像分类器在2012年赢得ImageNet大规模视觉识别挑战赛（ILSVRC）后变得流行起来。

由于每个目标检测器都有一个图像分类器作为核心，因此基于CNN的目标检测器的发明变得不可避免。

存在两个需要克服的挑战：

与传统技术（如HOG + SVM或HAAR级联）相比，基于CNN的图像分类器在计算上非常昂贵。
计算机视觉社区的野心日益增长。人们希望构建一个能处理不同长宽比的多类对象检测器，而不仅仅是能够处理不同尺度。

因此，基于滑动窗口的目标检测方法被排除在外。因为其代价太昂贵了。

研究人员开始尝试一个新的想法，即训练一个机器学习模型，该模型能够提出包含对象的边界框的位置。这些边界框被称为区域提议或对象提议。

区域提议仅仅是一系列带有包含对象的小概率的边界框。它并不知道或关心边界框中的是哪个对象。

区域提议算法会输出几百个边界框的列表，这些边界框位于不同的位置、尺度和长宽比。其中大多数边界框不包含任何对象。

为什么区域提议仍然有用呢？我们刚刚学到它们并不准确！

在由区域提议算法提出的几百个边界框上评估图像分类器要比在滑动窗口方法中评估数十万甚至数百万个边界框要便宜得多。因此，在某些时候，区域提议算法仍然是有用且方便的。

早使用区域提议的方法之一被罗斯·吉尔希克等人称为 R-CNN（即具有 CNN 特征的区域）。

在这里插入图片描述他们使用一种称为选择性搜索的算法来检测 2000 个区域提议，并在这些 2000 个边界框上运行了基于 CNN + SVM 的图像分类器。

当时 R-CNN 的准确率是最先进的，但速度仍然非常慢（在 GPU 上每张图像需要 18-20 秒）。

2.3 Fast R-CNN目标检测

在 R-CNN 中，每个边界框都由图像分类器独立分类。有 2000 个区域提议，而图像分类器为每个区域提议计算了一个特征图。这个过程是昂贵的。

在罗斯·吉尔希克的后续工作中，他提出了一种名为快速 R-CNN 的方法，显著加快了目标检测速度。

这个想法是为整个图像计算单个特征图，而不是为 2000 个区域提议计算 2000 个特征图。对于每个区域提议，一个感兴趣区域（RoI）池化层从特征图中提取了一个固定长度的特征向量。然后，每个特征向量被用于两个目的：

将区域分类为其中一个类别（例如狗、猫、背景）。
使用边界框回归器改善原始边界框的准确性。

在这里插入图片描述

2.4 Faster R-CNN目标检测

在快速 R-CNN 中，即使对 2000 个区域提议进行分类的计算是共享的，生成区域提议的部分与执行图像分类的部分没有共享任何计算。

在名为更快 R-CNN 的后续工作中，主要的洞察力是计算区域提议和图像分类的两个部分可以使用相同的特征图，因此可以共享计算负载。

使用卷积神经网络产生图像的特征图，该特征图同时用于训练区域提议网络和图像分类器。由于这种共享计算，目标检测的速度有了显著提高。

在这里插入图片描述

3. 使用PyTorch目标检测

在这一部分，我们将学习如何使用 PyTorch 实现 Faster R-CNN 目标检测器。

# import necessary libraries
from PIL import Image
import matplotlib.pyplot as plt
import torch
import torchvision.transforms as T
import torchvision
import torch
import numpy as np
import cv2
import os

3.1 输入输出

使用的预训练 Faster R-CNN ResNet-50 模型期望输入图像张量的形式为 [n, c, h, w]，并且具有最小尺寸为 800px，其中：

n 是图像的数量
c 是通道数，对于 RGB 图像，为 3
h 是图像的高度
w 是图像的宽度

模型将返回：

边界框 [x0, y0, x1, y1] 所有预测类的形状为 (N,4)，其中 N 是模型预测在图像中存在的类的数量。
所有预测类的标签。
每个预测标签的得分。

3.2 预训练模型

使用以下代码从torchvision下载预训练模型：

# get the pretrained model from torchvision.models
# Note: pretrained=True will get the pretrained weights for the model.
# model.eval() to use the model for inference
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

定义PyTorch官方文档中给出的类别名称：

 Class labels from official PyTorch documentation for the pretrained model
# Note that there are some N/A's 
# for complete list check https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
# we will use the same list for this notebook
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

3.3 模型预测

让我们定义一个函数来获取图像路径，并使用模型对图像进行预测。

def get_prediction(img_path, threshold):
  """
  get_prediction
    parameters:
      - img_path - path of the input image
      - threshold - threshold value for prediction score
    method:
      - Image is obtained from the image path
      - the image is converted to image tensor using PyTorch's Transforms
      - image is passed through the model to get the predictions
      - class, box coordinates are obtained, but only prediction score > threshold
        are chosen.
    
  """
  img = Image.open(img_path)
  transform = T.Compose([T.ToTensor()])
  img = transform(img)
  pred = model([img])
  pred_class = [COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(pred[0]['labels'].numpy())]
  pred_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(pred[0]['boxes'].detach().numpy())]
  pred_score = list(pred[0]['scores'].detach().numpy())
  pred_t = [pred_score.index(x) for x in pred_score if x>threshold][-1]
  pred_boxes = pred_boxes[:pred_t+1]
  pred_class = pred_class[:pred_t+1]
  return pred_boxes, pred_class

图片通过图像路径获取
使用PyTorch的转换将图片转换为图像张量
将图像通过模型以获取预测
获取类别和框坐标，但仅选择预测分数大于阈值的结果。

3.4 目标检测流程

def object_detection_api(img_path, threshold=0.5, rect_th=3, text_size=3, text_th=3):
  """
  object_detection_api
    parameters:
      - img_path - path of the input image
      - threshold - threshold value for prediction score
      - rect_th - thickness of bounding box
      - text_size - size of the class label text
      - text_th - thichness of the text
    method:
      - prediction is obtained from get_prediction method
      - for each prediction, bounding box is drawn and text is written 
        with opencv
      - the final image is displayed
  """
  boxes, pred_cls = get_prediction(img_path, threshold)
  img = cv2.imread(img_path)
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  for i in range(len(boxes)):
    #cv2.rectangle(img, boxes[i][0], boxes[i][1], color=(0, 255, 0), thickness=rect_th)
    cv2.rectangle(img, (int(boxes[i][0][0]), int(boxes[i][0][1])), (int(boxes[i][1][0]), int(boxes[i][1][1])), color=(0, 255, 0), thickness=rect_th)
    cv2.putText(img,pred_cls[i], (int(boxes[i][0][0]), int(boxes[i][0][1])), cv2.FONT_HERSHEY_SIMPLEX, text_size, (0,255,0),thickness=text_th)
  plt.figure(figsize=(20,30))
  plt.imshow(img)
  plt.xticks([])
  plt.yticks([])
  plt.show()

使用get_prediction方法获取预测结果
使用OpenCV对于每个预测，绘制边界框并写入文本
显示最终图像

3.5 推理

下载图像并进行推理：

def download_image(url, filename):
    """
    Downloads the image from the given URL if the file does not exist; does nothing if the file already exists.
    
    Parameters:
    url (str): The URL of the image.
    filename (str): The filename to save the image.
    """
    # Check if the file exists
    if not os.path.exists(filename):
        # Download the image if the file does not exist
        !wget {url} -O {filename}
        print(f"Downloaded {filename}.")
    else:
        print(f"File {filename} already exists, no need to download.")

示例一

# download an image for inference
#!wget https://www.wsha.org/wp-content/uploads/banner-diverse-group-of-people-2.jpg -O people.jpg
download_image("https://www.wsha.org/wp-content/uploads/banner-diverse-group-of-people-2.jpg", "people.jpg")

# use the api pipeline for object detection
# the threshold is set manually, the model sometimes predicts
# random structures as a potential object, so we set a threshold to keep objects 
# with better prediction scores.
object_detection_api('./people.jpg', threshold=0.8)

在这里插入图片描述

示例二

#!wget https://hips.hearstapps.com/hmg-prod.s3.amazonaws.com/images/10best-cars-group-cropped-1542126037.jpg -O cars.jpg
download_image("https://hips.hearstapps.com/hmg-prod.s3.amazonaws.com/images/10best-cars-group-cropped-1542126037.jpg", "cars.jpg")
  
object_detection_api('./cars.jpg', rect_th=6, text_th=5, text_size=5)

在这里插入图片描述

示例三

#!wget https://cdn.pixabay.com/photo/2013/07/05/01/08/traffic-143391_960_720.jpg -O traffic_scene.jpg
download_image("https://cdn.pixabay.com/photo/2013/07/05/01/08/traffic-143391_960_720.jpg", "traffic_scene.jpg")
  
object_detection_api('./traffic_scene.jpg', rect_th=2, text_th=1, text_size=1)

在这里插入图片描述

示例四

#!wget https://images.unsplash.com/photo-1458169495136-854e4c39548a -O traffic_scene2.jpg
download_image("https://images.unsplash.com/photo-1458169495136-854e4c39548a", "traffic_scene2.jpg")
  
object_detection_api('./traffic_scene2.jpg', rect_th=15, text_th=7, text_size=5, threshold=0.8)

在这里插入图片描述

4. 推断时间比较(CPU v.s. GPU)

# Let's run inference on all the downloaded images and average their inference time 
img_paths = [path for path in os.listdir(".") if path.split(".")[-1].lower() in ["jpeg", "jpg", "png"] ]

gpu_time = sum([check_inference_time(img_path, gpu=True) for img_path in img_paths])/len(img_paths)
cpu_time = sum([check_inference_time(img_path, gpu=False) for img_path in img_paths])/len(img_paths)


print('\n\nAverage Time take by the model with GPU = {}s\nAverage Time take by the model with CPU = {}s'.format(gpu_time, cpu_time))

每个模型在 CPU 和 GPU 上的推理时间。我们通过测量模型对输入图像进行预测所花费的时间来衡量。即：预测所花费的时间 = 模型（图像）。

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:456: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at /opt/pytorch/aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return F.conv2d(input, weight, bias, self.stride,



Average Time take by the model with GPU = 1.0562188625335693s
Average Time take by the model with CPU = 7.606167674064636s

在Jetson Orin Nano板子上：CPU耗时7.61秒 GPU耗时1.06秒。

测试代码：005 PyTorch Faster RCNN

5. 参考资料

【1】Colab/PyTorch - Getting Started with PyTorch

6. 补充

6.1 cv2.rectangle入参错误

执行示例代码遇到以下错误：

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
Cell In[4], line 9
      3 download_image("https://www.wsha.org/wp-content/uploads/banner-diverse-group-of-people-2.jpg", "people.jpg")
      5 # use the api pipeline for object detection
      6 # the threshold is set manually, the model sometimes predicts
      7 # random structures as a potential object, so we set a threshold to keep objects 
      8 # with better prediction scores.
----> 9 object_detection_api('./people.jpg', threshold=0.8)

Cell In[2], line 48, in object_detection_api(img_path, threshold, rect_th, text_size, text_th)
     46 img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
     47 for i in range(len(boxes)):
---> 48   cv2.rectangle(img, boxes[i][0], boxes[i][1], color=(0, 255, 0), thickness=rect_th)
     49   cv2.rectangle(img, (int(boxes[i][0][0]), int(boxes[i][0][1])), (int(boxes[i][1][0]), int(boxes[i][1][1])), color=(0, 255, 0), thickness=rect_th)
     50   cv2.putText(img,pred_cls[i], (int(boxes[i][0][0]), int(boxes[i][0][1])), cv2.FONT_HERSHEY_SIMPLEX, text_size, (0,255,0),thickness=text_th)

error: OpenCV(4.9.0) :-1: error: (-5:Bad argument) in function 'rectangle'
> Overload resolution failed:
>  - Can't parse 'pt1'. Sequence item with index 0 has a wrong type
>  - Can't parse 'pt1'. Sequence item with index 0 has a wrong type
>  - argument for rectangle() given by name ('color') and position (3)
>  - argument for rectangle() given by name ('color') and position (3)

Python的版本升级，入参出现了变化，导致无法执行，将cv2.rectangle(img, boxes[i][0], boxes[i][1], color=(0, 255, 0), thickness=rect_th)调整为cv2.rectangle(img, (int(boxes[i][0][0]), int(boxes[i][0][1])), (int(boxes[i][1][0]), int(boxes[i][1][1])), color=(0, 255, 0), thickness=rect_th)即可。

6.2 AssertionError: Torch not compiled with CUDA enabled

运行时遇到如下问题：

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[8], line 4
      1 # Let's run inference on all the downloaded images and average their inference time 
      2 img_paths = [path for path in os.listdir(".") if path.split(".")[-1].lower() in ["jpeg", "jpg", "png"] ]
----> 4 gpu_time = sum([check_inference_time(img_path, gpu=True) for img_path in img_paths])/len(img_paths)
      5 cpu_time = sum([check_inference_time(img_path, gpu=False) for img_path in img_paths])/len(img_paths)
      8 print('\n\nAverage Time take by the model with GPU = {}s\nAverage Time take by the model with CPU = {}s'.format(gpu_time, cpu_time))

Cell In[8], line 4, in <listcomp>(.0)
      1 # Let's run inference on all the downloaded images and average their inference time 
      2 img_paths = [path for path in os.listdir(".") if path.split(".")[-1].lower() in ["jpeg", "jpg", "png"] ]
----> 4 gpu_time = sum([check_inference_time(img_path, gpu=True) for img_path in img_paths])/len(img_paths)
      5 cpu_time = sum([check_inference_time(img_path, gpu=False) for img_path in img_paths])/len(img_paths)
      8 print('\n\nAverage Time take by the model with GPU = {}s\nAverage Time take by the model with CPU = {}s'.format(gpu_time, cpu_time))

Cell In[7], line 10, in check_inference_time(image_path, gpu)
      8 img = transform(img)
      9 if gpu:
---> 10   model.cuda()
     11   img = img.cuda()
     12 else:

File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:915, in Module.cuda(self, device)
    898 def cuda(self: T, device: Optional[Union[int, device]] = None) -> T:
    899     r"""Move all model parameters and buffers to the GPU.
    900 
    901     This also makes associated parameters and buffers different objects. So
   (...)
    913         Module: self
    914     """
--> 915     return self._apply(lambda t: t.cuda(device))

File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:779, in Module._apply(self, fn, recurse)
    777 if recurse:
    778     for module in self.children():
--> 779         module._apply(fn)
    781 def compute_should_use_set_data(tensor, tensor_applied):
    782     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    783         # If the new tensor has compatible tensor type as the existing tensor,
    784         # the current behavior is to change the tensor in-place using `.data =`,
   (...)
    789         # global flag to let the user control whether they want the future
    790         # behavior of overwriting the existing tensor or not.

File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:779, in Module._apply(self, fn, recurse)
    777 if recurse:
    778     for module in self.children():
--> 779         module._apply(fn)
    781 def compute_should_use_set_data(tensor, tensor_applied):
    782     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    783         # If the new tensor has compatible tensor type as the existing tensor,
    784         # the current behavior is to change the tensor in-place using `.data =`,
   (...)
    789         # global flag to let the user control whether they want the future
    790         # behavior of overwriting the existing tensor or not.

File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:779, in Module._apply(self, fn, recurse)
    777 if recurse:
    778     for module in self.children():
--> 779         module._apply(fn)
    781 def compute_should_use_set_data(tensor, tensor_applied):
    782     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    783         # If the new tensor has compatible tensor type as the existing tensor,
    784         # the current behavior is to change the tensor in-place using `.data =`,
   (...)
    789         # global flag to let the user control whether they want the future
    790         # behavior of overwriting the existing tensor or not.

File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:804, in Module._apply(self, fn, recurse)
    800 # Tensors stored in modules are graph leaves, and we don't want to
    801 # track autograd history of `param_applied`, so we have to use
    802 # `with torch.no_grad():`
    803 with torch.no_grad():
--> 804     param_applied = fn(param)
    805 p_should_use_set_data = compute_should_use_set_data(param, param_applied)
    807 # subclasses may have multiple child tensors so we need to use swap_tensors

File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:915, in Module.cuda.<locals>.<lambda>(t)
    898 def cuda(self: T, device: Optional[Union[int, device]] = None) -> T:
    899     r"""Move all model parameters and buffers to the GPU.
    900 
    901     This also makes associated parameters and buffers different objects. So
   (...)
    913         Module: self
    914     """
--> 915     return self._apply(lambda t: t.cuda(device))

File ~/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:284, in _lazy_init()
    279     raise RuntimeError(
    280         "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
    281         "multiprocessing, you must use the 'spawn' start method"
    282     )
    283 if not hasattr(torch._C, "_cuda_getDeviceCount"):
--> 284     raise AssertionError("Torch not compiled with CUDA enabled")
    285 if _cudart is None:
    286     raise AssertionError(
    287         "libcudart functions unavailable. It looks like you have a broken build?"
    288     )

AssertionError: Torch not compiled with CUDA enabled

安装指南：Installing PyTorch for Jetson Platform
二进制版本：Jetson Pack

$ export TORCH_INSTALL=https://developer.download.nvidia.cn/compute/redist/jp/v60dp/pytorch/torch-2.3.0a0+6ddf5cf85e.nv24.04.14026654-cp310-cp310-linux_aarch64.whl

$ python3 -m pip install --no-cache $TORCH_INSTALL
Collecting torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654
  Downloading https://developer.download.nvidia.cn/compute/redist/jp/v60dp/pytorch/torch-2.3.0a0+6ddf5cf85e.nv24.04.14026654-cp310-cp310-linux_aarch64.whl (1035.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 GB 9.2 MB/s eta 0:00:00
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (4.9.0)
Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (1.9)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (2024.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in ./.local/lib/python3.10/site-packages (from jinja2->torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (2.1.5)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 2.2.2
    Uninstalling torch-2.2.2:
      Successfully uninstalled torch-2.2.2
  Rolling back uninstall of torch
  Moving to /home/daniel/.local/bin/convert-caffe2-to-onnx
   from /tmp/pip-uninstall-43guaps2/convert-caffe2-to-onnx
  Moving to /home/daniel/.local/bin/convert-onnx-to-caffe2
   from /tmp/pip-uninstall-43guaps2/convert-onnx-to-caffe2
  Moving to /home/daniel/.local/bin/torchrun
   from /tmp/pip-uninstall-43guaps2/torchrun
  Moving to /home/daniel/.local/lib/python3.10/site-packages/functorch/
   from /home/daniel/.local/lib/python3.10/site-packages/~unctorch
  Moving to /home/daniel/.local/lib/python3.10/site-packages/torch-2.2.2.dist-info/
   from /home/daniel/.local/lib/python3.10/site-packages/~orch-2.2.2.dist-info
  Moving to /home/daniel/.local/lib/python3.10/site-packages/torch.libs/
   from /home/daniel/.local/lib/python3.10/site-packages/~orch.libs
  Moving to /home/daniel/.local/lib/python3.10/site-packages/torch/
   from /home/daniel/.local/lib/python3.10/site-packages/~orch
  Moving to /home/daniel/.local/lib/python3.10/site-packages/torchgen/
   from /home/daniel/.local/lib/python3.10/site-packages/~orchgen
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/usr/local/bin/convert-caffe2-to-onnx'
Consider using the `--user` option or check the permissions.

以下虽然安装成功，但是依然存在问题（待查）

$ python3 -m pip install --no-cache --user $TORCH_INSTALL
Collecting torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654
  Downloading https://developer.download.nvidia.cn/compute/redist/jp/v60dp/pytorch/torch-2.3.0a0+6ddf5cf85e.nv24.04.14026654-cp310-cp310-linux_aarch64.whl (1035.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 GB 8.9 MB/s eta 0:00:00
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (4.9.0)
Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (1.9)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (2024.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in ./.local/lib/python3.10/site-packages (from jinja2->torch==2.3.0a0+6ddf5cf85e.nv24.04.14026654) (2.1.5)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 2.2.2
    Uninstalling torch-2.2.2:
      Successfully uninstalled torch-2.2.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.2.2 requires torch==2.2.2, but you have torch 2.3.0a0+6ddf5cf85e.nv24.4 which is incompatible.
torchvision 0.17.2 requires torch==2.2.2, but you have torch 2.3.0a0+6ddf5cf85e.nv24.4 which is incompatible.
Successfully installed torch-2.3.0a0+6ddf5cf85e.nv24.4

6.3 cuda Pytorch遗留问题

RuntimeError                              Traceback (most recent call last)
Cell In[1], line 5
      3 import matplotlib.pyplot as plt
      4 import torch
----> 5 import torchvision.transforms as T
      6 import torchvision
      7 import torch

File ~/.local/lib/python3.10/site-packages/torchvision/__init__.py:6
      3 from modulefinder import Module
      5 import torch
----> 6 from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
      8 from .extension import _HAS_OPS
     10 try:

File ~/.local/lib/python3.10/site-packages/torchvision/_meta_registrations.py:164
    153     torch._check(
    154         grad.dtype == rois.dtype,
    155         lambda: (
   (...)
    158         ),
    159     )
    160     return grad.new_empty((batch_size, channels, height, width))
    163 @torch._custom_ops.impl_abstract("torchvision::nms")
--> 164 def meta_nms(dets, scores, iou_threshold):
    165     torch._check(dets.dim() == 2, lambda: f"boxes should be a 2d tensor, got {dets.dim()}D")
    166     torch._check(dets.size(1) == 4, lambda: f"boxes should have 4 elements in dimension 1, got {dets.size(1)}")

File ~/.local/lib/python3.10/site-packages/torch/library.py:467, in impl_abstract.<locals>.inner(func)
    464 else:
    465     func_to_register = func
--> 467 handle = entry.abstract_impl.register(func_to_register, source)
    468 if lib is not None:
    469     lib._registration_handles.append(handle)

File ~/.local/lib/python3.10/site-packages/torch/_library/abstract_impl.py:30, in AbstractImplHolder.register(self, func, source)
     24 if self.kernel is not None:
     25     raise RuntimeError(
     26         f"impl_abstract(...): the operator {self.qualname} "
     27         f"already has an abstract impl registered at "
     28         f"{self.kernel.source}."
     29     )
---> 30 if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
     31     raise RuntimeError(
     32         f"impl_abstract(...): the operator {self.qualname} "
     33         f"already has an DispatchKey::Meta implementation via a "
   (...)
     36         f"impl_abstract."
     37     )
     39 if torch._C._dispatch_has_kernel_for_dispatch_key(
     40     self.qualname, "CompositeImplicitAutograd"
     41 ):

RuntimeError: operator torchvision::nms does not exist

执行如下指令恢复后，上述RuntimeError消失，但是6.2问题依旧存在。

$ pip3 install torch==2.3.0 torchvision==0.18.0 --user -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==2.3.0
  Using cached https://download.pytorch.org/whl/cpu/torch-2.3.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (88.5 MB)
Collecting torchvision==0.18.0
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/b2/dd/e5d39496413a5e5c2ca69d333bc241e7c8e8e412778c8309d54ce27cb9ec/torchvision-0.18.0-cp310-cp310-manylinux2014_aarch64.whl (14.0 MB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (4.9.0)
Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from torch==2.3.0) (1.9)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (2024.2.0)
Requirement already satisfied: numpy in ./.local/lib/python3.10/site-packages (from torchvision==0.18.0) (1.23.4)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/lib/python3/dist-packages (from torchvision==0.18.0) (9.0.1)
Requirement already satisfied: MarkupSafe>=2.0 in ./.local/lib/python3.10/site-packages (from jinja2->torch==2.3.0) (2.1.5)
Installing collected packages: torch, torchvision
  Attempting uninstall: torch
    Found existing installation: torch 2.3.0a0+6ddf5cf85e.nv24.4
    Uninstalling torch-2.3.0a0+6ddf5cf85e.nv24.4:
      Successfully uninstalled torch-2.3.0a0+6ddf5cf85e.nv24.4
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.17.2
    Uninstalling torchvision-0.17.2:
      Successfully uninstalled torchvision-0.17.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.2.2 requires torch==2.2.2, but you have torch 2.3.0 which is incompatible.
Successfully installed torch-2.3.0 torchvision-0.18.0

6.4 PyTorch for Jetson

PyTorch for Jetson

在这里插入图片描述

$ pip3 uninstall torchvision torch
Found existing installation: torchvision 0.18.0
Uninstalling torchvision-0.18.0:
  Would remove:
    /home/daniel/.local/lib/python3.10/site-packages/torchvision-0.18.0.dist-info/*
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/ld-linux-aarch64.514b5772.so.1
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libavcodec.8c7e9066.so.58
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libavformat.f982c100.so.58
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libavutil.0fa1cd1a.so.56
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libbz2.98aee962.so.1.0
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libgnutls.a6162a9d.so.30
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libhogweed.ea0b2e46.so.6
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libiconv.b05ac3d8.so.2
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libjpeg.fe3c2a39.so.8
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libmp3lame.e7636c4b.so.0
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libnettle.224b64e4.so.8
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libopenh264.96de4a3e.so.6
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libpng16.d8404d82.so.16
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libswresample.099dd314.so.3
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libswscale.6acab513.so.5
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libx264.21f46178.so.152
    /home/daniel/.local/lib/python3.10/site-packages/torchvision.libs/libz.154608c1.so.1
    /home/daniel/.local/lib/python3.10/site-packages/torchvision/*
Proceed (Y/n)? Y
  Successfully uninstalled torchvision-0.18.0
Found existing installation: torch 2.3.0
Uninstalling torch-2.3.0:
  Would remove:
    /home/daniel/.local/bin/convert-caffe2-to-onnx
    /home/daniel/.local/bin/convert-onnx-to-caffe2
    /home/daniel/.local/bin/torchrun
    /home/daniel/.local/lib/python3.10/site-packages/functorch/*
    /home/daniel/.local/lib/python3.10/site-packages/torch-2.3.0.dist-info/*
    /home/daniel/.local/lib/python3.10/site-packages/torch.libs/libarm_compute-ed97c1b0.so
    /home/daniel/.local/lib/python3.10/site-packages/torch.libs/libarm_compute_core-0793f69d.so
    /home/daniel/.local/lib/python3.10/site-packages/torch.libs/libarm_compute_graph-15f701fb.so
    /home/daniel/.local/lib/python3.10/site-packages/torch.libs/libgfortran-105e6576.so.5.0.0
    /home/daniel/.local/lib/python3.10/site-packages/torch.libs/libomp-b8e5bcfb.so
    /home/daniel/.local/lib/python3.10/site-packages/torch.libs/libopenblasp-r0-f658af2e.3.25.so
    /home/daniel/.local/lib/python3.10/site-packages/torch/*
    /home/daniel/.local/lib/python3.10/site-packages/torchgen/*
Proceed (Y/n)? Y
  Successfully uninstalled torch-2.3.0

$ sudo python3 -m pip install torch-2.3.0-cp310-cp310-linux_aarch64.whl
Processing ./torch-2.3.0-cp310-cp310-linux_aarch64.whl
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (4.9.0)
Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from torch==2.3.0) (1.9)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch==2.3.0) (2024.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch==2.3.0) (2.1.5)
Installing collected packages: torch
Successfully installed torch-2.3.0

$ sudo python3 -m pip install torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl
Processing ./torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from torchvision==0.18.0a0+6043bc2) (1.26.4)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from torchvision==0.18.0a0+6043bc2) (2.3.0)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/lib/python3/dist-packages (from torchvision==0.18.0a0+6043bc2) (9.0.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->torchvision==0.18.0a0+6043bc2) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch->torchvision==0.18.0a0+6043bc2) (4.9.0)
Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from torch->torchvision==0.18.0a0+6043bc2) (1.9)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->torchvision==0.18.0a0+6043bc2) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->torchvision==0.18.0a0+6043bc2) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->torchvision==0.18.0a0+6043bc2) (2024.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->torchvision==0.18.0a0+6043bc2) (2.1.5)
Installing collected packages: torchvision
Successfully installed torchvision-0.18.0a0+6043bc2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv