Jetson Xavier NX 安装Ubuntu20.04,配置CUDA,cuDNN,Pytorch等环境教程并使用TensorRT加速后的yolov5实现实时摄像头目标检测

一、Jetson Xavier NX刷机基础环境搭建

使用的是实验室上届师兄留下来的板子,所以拿到之后先进行了刷机

1.准备工作

准备一台ubuntu系统电脑或者虚拟机,并在上面安装NVIDIA SDK MANAGER(下载地址:SDK Manager | NVIDIA Developer),下载完deb包后输入如下命令安装(自行填补省略号处)。

sudo apt install ./sdkmanager_....._amd64.deb

并准备 microUSB转USB数据线、电源线、母对母杜邦线。

2.开始刷机

用杜邦线将Xavier NX第三个引脚FC_REC与第二引脚GND短接(上电后会进入recovery模式)

并插上电源给板子供电,然后通过NX的microusb接口连接电脑的USB接口。

电脑打开NVIDIA SDK MANAGER软件完成登录后会自动识别到板子连接,P3668-0001 or P3668-0003 module为eMMc版本,我的板子是这个版本所以选择第一个,下面的JetPack版本可以自行选择,每个版本对应的Ubuntu版本可以在官网查看(JetPack Archive | NVIDIA Developer),取消勾选DeepStream和Host Machine。

 ②

先只安装Jetpack系统,等将系统迁移到SSD上面后,再进行其他SDK组件的安装(jetson Xavier NX系统自带内存16g不足以安装其他SDK组件)。

 ③

接下来安静等待即可,下载速度取决于网络速度,下载完后面安装过程大约在10~15分分钟左右,期间会有一个设置用户和用户密码的过程,按照提示和需要进行设置即可,我当时只设置了用户和用户密码。

完成会有如下提示

3.扩容并设置SSD为首启动方式

官方板子自带内存仅为16G,厂家一般提供SSD内存条用以扩容。插上内存条SSD后,在NX启动的Ubuntu系统中,打开disks软件,点击右上角菜单,将固态硬盘格式化,无需更改选项,直接格式化。

点击左下角+号继续分区,增加16GB的空间作为SWAP交换空间,重命名后创建。

最终实现效果如下 ,并点击三角符号挂载。

 ④

从git上复制最新rootOnNVMe项目到home目录下,然后执行下面的脚本,将根源文件复制到SSD,然后运行从SSD启动的脚本,最后重启生效。

git clone https://github.com/jetsonhacks/rootOnNVMe.git
cd rootOnNVMe
./copy-rootfs-ssd.sh
./setup-service.sh

4.安装CUDA,cuDNN等组件(Jetson SDK Components)

下面安装CUDA等其他组件,拔掉跳线帽,USB不拔,Ubuntu电脑继续打开sdk manager,确定jetpack版本号,必须保证jetpack版本与刷机的版本一致,否则安装会失败,点击continue进入下一步。

下面取消勾选系统,只勾选组件,通过SDKmanager自动安装与jetpack版本对应的CUDA、cuDNN、OpenCV(不支持CUDA加速的版本)、VPI等必须的组件,左下角同意协议后点击continue进入下一步。

然后会出现让输入用户名和密码,只需要输入上面设置的用户名和密码即可 ,IP 地址默认是 192.168.55.1,选择USB选项,然后开始下载,如果有红字报错,等待板子启动后完全连接上即可,接下来会进入安装,时间较久。

5.Pytorch、torchvision安装

下载jetpack专用的Pytorch源码,资源由NVIDIA提供在官方论坛上(PyTorch for Jetson - Announcements - NVIDIA Developer Forums),含最新完整安装、验证教程,根据自己的python版本下载PyTorch,进入下载的pytorch所在文件夹打开终端然后输入下面指令。

pip3 install torch-......whl

然后前往GitHub仓库下载对应版本的torchversion(GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision),每个版本上面会有指示对应哪个版本Pytorch,下载完并解压,然后进入该项目文件夹下,运行以下代码执行安装。

python3 setup.py install

接下来配置环境变量,输入以下代码

gedit  ~/.bashrc 

进入文件,在文件最下方复制粘贴进以下代码

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_ROOT=/usr/local/cuda

然后保存退出重启终端窗口后生效。

打开一个新的终端窗口,输入python3

然后输入以下代码检验PyTorch是否安装成功+CUDA是否可用

import torch
print(torch.__version__)
print(str(torch.cuda.is_available()))

至此,Jetson Xavier NX刷机基础环境搭建完成。

tips:如果在安装Jetson SDK Components过程中出现类似于“sudo:3次密码错误”之类的报错,可以尝试在刷系统的时候把Jetson Linux和Jetson SDK Components都勾选上,Jetson Linux刷完后板子自动进入系统,然后电脑会再自动进行Jetson SDK Components安装,这时依旧会跳出验证用户名和用户密码的窗口,但是这时候出来的会自动填充,这时候先不管,也不要关闭窗口,也不要拔掉usb线,只拔掉杜邦线,然后操作板子扩容设置SSD为启动方式后,让板子重启与电脑自动连接好,点击下一步即可。

二、使用TensorRT加速后的yolov5实现实时摄像头目标检测

1.克隆工程(V5.0)

注意两个都为v5.0版本

git clone -b v5.0 https://github.com/ultralytics/yolov5.git 
git clone -b yolov5-v5.0 https://github.com/wang-xinyu/tensorrtx.git

2.生成.engine文件

下载yolov5s.pt(V5.0)到yolov5工程的weights文件夹下。

复制tensorrtx/yolov5文件夹下的gen_wts.py文件到yolov5工程下。

由.pt文件生成.wts文件。

python3 gen_wts.py -w weights/yolov5s.pt

在tensorrtx/yolov5文件夹下创建build文件夹,并将刚才生成的.wts文件复制进去,然后在build文件夹中打开终端输入以下代码,由.wts文件生成.engine文件。

cmake ..
make
sudo ./yolov5 -s yolov5s.wts yolov5s.engine s

 以上过程中缺少什么库自行安装即可。

3.USB摄像头实时检测

在tensorrt工程下新建一个yolov5_trt2.py文件,复制下面v5.0的代码到其中,根据摄像头是几号自行修改代码中cap = cv2.VideoCapture(0),此处我的USB摄像头为0号。

"""
An example that uses TensorRT's Python api to make inferences.
"""
import ctypes
import os
import shutil
import random
import sys
import threading
import time
import cv2
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
import torch
import torchvision
import argparse
 
CONF_THRESH = 0.5
IOU_THRESHOLD = 0.4
 
 
def get_img_path_batches(batch_size, img_dir):
    ret = []
    batch = []
    for root, dirs, files in os.walk(img_dir):
        for name in files:
            if len(batch) == batch_size:
                ret.append(batch)
                batch = []
            batch.append(os.path.join(root, name))
    if len(batch) > 0:
        ret.append(batch)
    return ret
 
def plot_one_box(x, img, color=None, label=None, line_thickness=None):
    """
    description: Plots one bounding box on image img,
                 this function comes from YoLov5 project.
    param: 
        x:      a box likes [x1,y1,x2,y2]
        img:    a opencv image object
        color:  color to draw rectangle, such as (0,255,0)
        label:  str
        line_thickness: int
    return:
        no return
    """
    tl = (
        line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1
    )  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filled
        cv2.putText(
            img,
            label,
            (c1[0], c1[1] - 2),
            0,
            tl / 3,
            [225, 255, 255],
            thickness=tf,
            lineType=cv2.LINE_AA,
        )
 
 
class YoLov5TRT(object):
    """
    description: A YOLOv5 class that warps TensorRT ops, preprocess and postprocess ops.
    """
 
    def __init__(self, engine_file_path):
        # Create a Context on this device,
        self.ctx = cuda.Device(0).make_context()
        stream = cuda.Stream()
        TRT_LOGGER = trt.Logger(trt.Logger.INFO)
        runtime = trt.Runtime(TRT_LOGGER)
 
        # Deserialize the engine from file
        with open(engine_file_path, "rb") as f:
            engine = runtime.deserialize_cuda_engine(f.read())
        context = engine.create_execution_context()
 
        host_inputs = []
        cuda_inputs = []
        host_outputs = []
        cuda_outputs = []
        bindings = []
 
        for binding in engine:
            print('bingding:', binding, engine.get_binding_shape(binding))
            size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
            dtype = trt.nptype(engine.get_binding_dtype(binding))
            # Allocate host and device buffers
            host_mem = cuda.pagelocked_empty(size, dtype)
            cuda_mem = cuda.mem_alloc(host_mem.nbytes)
            # Append the device buffer to device bindings.
            bindings.append(int(cuda_mem))
            # Append to the appropriate list.
            if engine.binding_is_input(binding):
                self.input_w = engine.get_binding_shape(binding)[-1]
                self.input_h = engine.get_binding_shape(binding)[-2]
                host_inputs.append(host_mem)
                cuda_inputs.append(cuda_mem)
            else:
                host_outputs.append(host_mem)
                cuda_outputs.append(cuda_mem)
 
        # Store
        self.stream = stream
        self.context = context
        self.engine = engine
        self.host_inputs = host_inputs
        self.cuda_inputs = cuda_inputs
        self.host_outputs = host_outputs
        self.cuda_outputs = cuda_outputs
        self.bindings = bindings
        self.batch_size = engine.max_batch_size
 
    def infer(self, input_image_path):
        threading.Thread.__init__(self)
        # Make self the active context, pushing it on top of the context stack.
        self.ctx.push()
        self.input_image_path = input_image_path
        # Restore
        stream = self.stream
        context = self.context
        engine = self.engine
        host_inputs = self.host_inputs
        cuda_inputs = self.cuda_inputs
        host_outputs = self.host_outputs
        cuda_outputs = self.cuda_outputs
        bindings = self.bindings
        # Do image preprocess
        batch_image_raw = []
        batch_origin_h = []
        batch_origin_w = []
        batch_input_image = np.empty(shape=[self.batch_size, 3, self.input_h, self.input_w])
 
        input_image, image_raw, origin_h, origin_w = self.preprocess_image(input_image_path
                                                                           )
 
 
        batch_origin_h.append(origin_h)
        batch_origin_w.append(origin_w)
        np.copyto(batch_input_image, input_image)
        batch_input_image = np.ascontiguousarray(batch_input_image)
 
        # Copy input image to host buffer
        np.copyto(host_inputs[0], batch_input_image.ravel())
        start = time.time()
        # Transfer input data  to the GPU.
        cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
        # Run inference.
        context.execute_async(batch_size=self.batch_size, bindings=bindings, stream_handle=stream.handle)
        # Transfer predictions back from the GPU.
        cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
        # Synchronize the stream
        stream.synchronize()
        end = time.time()
        # Remove any context from the top of the context stack, deactivating it.
        self.ctx.pop()
        # Here we use the first row of output in that batch_size = 1
        output = host_outputs[0]
        # Do postprocess
        result_boxes, result_scores, result_classid = self.post_process(
            output, origin_h, origin_w)
        # Draw rectangles and labels on the original image
        for j in range(len(result_boxes)):
            box = result_boxes[j]
            plot_one_box(
                box,
                image_raw,
                label="{}:{:.2f}".format(
                    categories[int(result_classid[j])], result_scores[j]
                ),
            )
        return image_raw, end - start
 
    def destroy(self):
        # Remove any context from the top of the context stack, deactivating it.
        self.ctx.pop()
        
    def get_raw_image(self, image_path_batch):
        """
        description: Read an image from image path
        """
        for img_path in image_path_batch:
            yield cv2.imread(img_path)
        
    def get_raw_image_zeros(self, image_path_batch=None):
        """
        description: Ready data for warmup
        """
        for _ in range(self.batch_size):
            yield np.zeros([self.input_h, self.input_w, 3], dtype=np.uint8)
 
    def preprocess_image(self, input_image_path):
        """
        description: Convert BGR image to RGB,
                     resize and pad it to target size, normalize to [0,1],
                     transform to NCHW format.
        param:
            input_image_path: str, image path
        return:
            image:  the processed image
            image_raw: the original image
            h: original height
            w: original width
        """
        image_raw = input_image_path
        h, w, c = image_raw.shape
        image = cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB)
        # Calculate widht and height and paddings
        r_w = self.input_w / w
        r_h = self.input_h / h
        if r_h > r_w:
            tw = self.input_w
            th = int(r_w * h)
            tx1 = tx2 = 0
            ty1 = int((self.input_h - th) / 2)
            ty2 = self.input_h - th - ty1
        else:
            tw = int(r_h * w)
            th = self.input_h
            tx1 = int((self.input_w - tw) / 2)
            tx2 = self.input_w - tw - tx1
            ty1 = ty2 = 0
        # Resize the image with long side while maintaining ratio
        image = cv2.resize(image, (tw, th))
        # Pad the short side with (128,128,128)
        image = cv2.copyMakeBorder(
            image, ty1, ty2, tx1, tx2, cv2.BORDER_CONSTANT, (128, 128, 128)
        )
        image = image.astype(np.float32)
        # Normalize to [0,1]
        image /= 255.0
        # HWC to CHW format:
        image = np.transpose(image, [2, 0, 1])
        # CHW to NCHW format
        image = np.expand_dims(image, axis=0)
        # Convert the image to row-major order, also known as "C order":
        image = np.ascontiguousarray(image)
        return image, image_raw, h, w
 
    def xywh2xyxy(self, origin_h, origin_w, x):
        """
        description:    Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
        param:
            origin_h:   height of original image
            origin_w:   width of original image
            x:          A boxes tensor, each row is a box [center_x, center_y, w, h]
        return:
            y:          A boxes tensor, each row is a box [x1, y1, x2, y2]
        """
        y = torch.zeros_like(x) if isinstance(x, torch.Tensor) else np.zeros_like(x)
        r_w = self.input_w / origin_w
        r_h = self.input_h / origin_h
        if r_h > r_w:
            y[:, 0] = x[:, 0] - x[:, 2] / 2
            y[:, 2] = x[:, 0] + x[:, 2] / 2
            y[:, 1] = x[:, 1] - x[:, 3] / 2 - (self.input_h - r_w * origin_h) / 2
            y[:, 3] = x[:, 1] + x[:, 3] / 2 - (self.input_h - r_w * origin_h) / 2
            y /= r_w
        else:
            y[:, 0] = x[:, 0] - x[:, 2] / 2 - (self.input_w - r_h * origin_w) / 2
            y[:, 2] = x[:, 0] + x[:, 2] / 2 - (self.input_w - r_h * origin_w) / 2
            y[:, 1] = x[:, 1] - x[:, 3] / 2
            y[:, 3] = x[:, 1] + x[:, 3] / 2
            y /= r_h
 
        return y
 
    def post_process(self, output, origin_h, origin_w):
        """
        description: postprocess the prediction
        param:
            output:     A tensor likes [num_boxes,cx,cy,w,h,conf,cls_id, cx,cy,w,h,conf,cls_id, ...] 
            origin_h:   height of original image
            origin_w:   width of original image
        return:
            result_boxes: finally boxes, a boxes tensor, each row is a box [x1, y1, x2, y2]
            result_scores: finally scores, a tensor, each element is the score correspoing to box
            result_classid: finally classid, a tensor, each element is the classid correspoing to box
        """
        # Get the num of boxes detected
        num = int(output[0])
        # Reshape to a two dimentional ndarray
        pred = np.reshape(output[1:], (-1, 6))[:num, :]
        # to a torch Tensor
        pred = torch.Tensor(pred).cuda()
        # Get the boxes
        boxes = pred[:, :4]
        # Get the scores
        scores = pred[:, 4]
        # Get the classid
        classid = pred[:, 5]
        # Choose those boxes that score > CONF_THRESH
        si = scores > CONF_THRESH
        boxes = boxes[si, :]
        scores = scores[si]
        classid = classid[si]
        # Trandform bbox from [center_x, center_y, w, h] to [x1, y1, x2, y2]
        boxes = self.xywh2xyxy(origin_h, origin_w, boxes)
        # Do nms
        indices = torchvision.ops.nms(boxes, scores, iou_threshold=IOU_THRESHOLD).cpu()
        result_boxes = boxes[indices, :].cpu()
        result_scores = scores[indices].cpu()
        result_classid = classid[indices].cpu()
        return result_boxes, result_scores, result_classid
 
 
class inferThread(threading.Thread):
    def __init__(self, yolov5_wrapper):
        threading.Thread.__init__(self)
        self.yolov5_wrapper = yolov5_wrapper
    def infer(self , frame):
        batch_image_raw, use_time = self.yolov5_wrapper.infer(frame)
 
        # for i, img_path in enumerate(self.image_path_batch):
        #     parent, filename = os.path.split(img_path)
        #     save_name = os.path.join('output', filename)
        #     # Save image
        #     cv2.imwrite(save_name, batch_image_raw[i])
        # print('input->{}, time->{:.2f}ms, saving into output/'.format(self.image_path_batch, use_time * 1000))
        return batch_image_raw,use_time
 
class warmUpThread(threading.Thread):
    def __init__(self, yolov5_wrapper):
        threading.Thread.__init__(self)
        self.yolov5_wrapper = yolov5_wrapper
 
    def run(self):
        batch_image_raw, use_time = self.yolov5_wrapper.infer(self.yolov5_wrapper.get_raw_image_zeros())
        print('warm_up->{}, time->{:.2f}ms'.format(batch_image_raw[0].shape, use_time * 1000))
 
 
 
if __name__ == "__main__":
    # load custom plugins
    parser = argparse.ArgumentParser()
    parser.add_argument('--engine', nargs='+', type=str, default="build/yolov5s.engine", help='.engine path(s)')
    parser.add_argument('--save', type=int, default=0, help='save?')
    opt = parser.parse_args()
    PLUGIN_LIBRARY = "build/libmyplugins.so"
    engine_file_path = opt.engine
 
    ctypes.CDLL(PLUGIN_LIBRARY)
 
    # load coco labels
 
    categories = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
            "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
            "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
            "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
            "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
            "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
            "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
            "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
            "hair drier", "toothbrush"]
    # a YoLov5TRT instance
    yolov5_wrapper = YoLov5TRT(engine_file_path)
    cap = cv2.VideoCapture(0)
    try:
        thread1 = inferThread(yolov5_wrapper)
        thread1.start()
        thread1.join()
        while 1:
            _,frame = cap.read()
            img,t=thread1.infer(frame)
            fps = 1/t    #显示帧率
            imgout = cv2.putText(img, "FPS= %.2f" % (fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
            cv2.imshow("result", img)
            if cv2.waitKey(1) & 0XFF == ord('q'):  # 1 millisecond
                break
 
 
    finally:
        # destroy the instance
        cap.release()
        cv2.destroyAllWindows()
        yolov5_wrapper.destroy()

然后在tensorrtx文件夹下打开终端并输入以下

python3 yolov5_trt2.py

等待十秒到十五秒左右开始运行。

参考文章:

英伟达官方源Jetson Xavier NX安装Ubuntu20.04,配置CUDA,cuDNN,Pytorch等环境教程(基于NVIDIA官方教程,理论适用其它Jetson设备)-CSDN博客

Jetson Xavier NX 配置(六)—— 连接CSI和USB摄像头进行Yolov5实时目标检测_xavier nx csi gstcamera timeout-CSDN博客

【究极缝合】Jeston Nano环境配置+部署yolov5(master)+Tensorrt加速+usb摄像头测试_yolov5_wrapper.infer-CSDN博客

Jetson Xavier NX 配置(五)—— Yolo5+TensorRT加速的pytorch环境配置_assertion `scale_1' failed.-CSDN博客

  • 13
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值