Intel Realsense D435结合YoloV5训练的模型进行实时目标识别并获取目标物三维坐标-精简版

悲喜自度在线捏对象

已于 2024-09-13 10:29:30 修改

阅读量360

点赞数 9

分类专栏：深度学习文章标签： YOLO python opencv 计算机视觉目标检测深度学习人工智能

于 2024-09-13 10:11:12 首次发布

本文链接：https://blog.csdn.net/qq_35257559/article/details/142129036

版权

深度学习专栏收录该内容

8 篇文章 1 订阅

订阅专栏

1.效果视频和效果图

在这里插入图片描述

目标物三维坐标检测+深度信息效果

2.代码教程

第一步：准备工作(下面列举的是博主的环境配置)

（1）安装Anaconda(Anaconda3-2021.04)、CUDA(cuda_11.7.1_516.94)、Pycharm
（2）创建虚拟环境：conda create -n visual_intel python=3.10
（3）激活虚拟环境：conda activate visual_intel
（4）安装依赖包：torch(1.13.1+cu117-cp310)、torchaudio(0.13.1+cu117-cp310)、torchvision(0.14.1+cu117-cp310)、opencv-python(4.10.0.84)、pyrealsense2(2.55.16486)、numpy(2.1.0)注意：这里如果报numpy相关的错误，大概率版本问题，可以将numpy降级到1.26.4,大环境就这些，运行过程中会报一些xxx模块没找到，少那个模块直接pip install xxx就可以了
（5）一个训练好的pt模型
（6）我的目录结构（注意：因为我这里做了和机械臂的交互，下面的代码部分我把机械臂交互的地方删掉了）
在这里插入图片描述

第二步：引入相关包

# coding=utf8
import torch
import cv2
from utils.midpos import get_mid_pos
import pyrealsense2 as rs
import numpy as np
import json

第三步：模型参数和intel realsense相机配置

下面的这部分用来进行加载pt模型，设置模型检测的置信度，配置RealSense的管道等，说明一下，我这里torch.hub.load采用的是本地模型加载的方式，如果加载线上的一些模型，这里要做一些修改

'''
模型路径和置信度配置
'''
model = torch.hub.load('./', 'custom', 'small.pt', source='local')
model.conf = 0.5

'''
realsense连接配置
'''
# 创建一个RealSense pipeline
pipeline = rs.pipeline()
# 配置管道
config = rs.config()
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
# 启动管道
profile = pipeline.start(config)
# 创建一个align对象，用于将深度流和颜色流对齐
align = rs.align(rs.stream.color)

第四步：实时目标识别和坐标获取函数

下面的这部分代码提供了以下功能：
（1）用来实时获取目标物的三维坐标（坐标是目标物的中心点）
（2）可获取相机内参并保存到json文件（这个是我之前做标定用的）
一般相机内参矩阵是下面的K矩阵，这个矩阵的c(x)和c(y)对应的就是intel realsense相机的ppx和ppy
在这里插入图片描述

def get_target_coordinate():
    # 目标物中心点初始化
    x3 = 0
    y3 = 0

    try:
        while True:
            frames = pipeline.wait_for_frames()  # 等待获取图像帧
            aligned_frames = align.process(frames)  # 获取对齐帧
            depth_frame = aligned_frames.get_depth_frame()  # 获取对齐帧中的depth帧
            color_frame = aligned_frames.get_color_frame()  # 获取对齐帧中的color帧

            # 相机参数的获取
            intrinsic = color_frame.profile.as_video_stream_profile().intrinsics  # 获取相机内参
            depth_intrinsic = depth_frame.profile.as_video_stream_profile().intrinsics  # 获取深度参数（像素坐标系转相机坐标系会用到）
            camera_parameters = {'fx': intrinsic.fx, 'fy': intrinsic.fy, 'ppx': intrinsic.ppx, 'ppy': intrinsic.ppy,
                                 'height': intrinsic.height, 'width': intrinsic.width,
                                 'depth_scale': profile.get_device().first_depth_sensor().get_depth_scale()
                                 }
            # 保存内参到本地
            with open('./intrinsics.json', 'w') as fp:
                json.dump(camera_parameters, fp)

            # 识别部分
            depth_image = np.asanyarray(depth_frame.get_data())
            color_image = np.asanyarray(color_frame.get_data())
            results = model(color_image)
            boxs = results.pandas().xyxy[0].values
            for box in boxs:
                # print(str(box)," 检测到的目标：", str(box[6]))
                cv2.rectangle(color_image, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (0, 255, 0), 2)
                dist = get_mid_pos(color_image, box, depth_image, 24)
                # 获取目标物的距离相机的距离
                distance = box[-1] + str(dist / 1000)[:4] + 'm'
                # 获取目标物的相关坐标，左上、右下和中心点
                x1, y1, x2, y2, x3, y3 = box[0], box[1], box[2], box[3], (box[0] + box[2]) / 2, (box[1] + box[3]) / 2
                print(f'目标物坐标(右上:x1={x1},y1={y1})(左下:x2={x2},y2={y2})(中心:x3={x3},y3={y3})')
                # 如果绘制的文字超出了画面最上边的显示的范围，进行调整
                if box[1] < 10:
                    cv2.putText(color_image, distance, (int(box[0]), int(box[1]) + 30), cv2.FONT_HERSHEY_SIMPLEX, 1,(255, 255, 255), 2)
                else:
                    cv2.putText(color_image, distance, (int(box[0]), int(box[1])), cv2.FONT_HERSHEY_SIMPLEX, 1,(255, 255, 255), 2)

            depth_colormap = cv2.applyColorMap(cv2.convertScaleAbs(depth_image, alpha=0.03), cv2.COLORMAP_JET)
            # 拼接RGB图和深度图
            images = np.hstack((color_image, depth_colormap))

            # 定义需要得到真实三维信息的像素点（x, y)
            x = int(x3)
            y = int(y3)
            dis = depth_frame.get_distance(x, y)  # （x, y)点的真实深度值
            # （x,y)点在相机坐标系下的真实值为一个三维向量, 其中camera_coordinate[2]仍为dis，camera_coordinate[0]和camera_coordinate[1]为相机坐标系下的xy真实距离。
            camera_coordinate = rs.rs2_deproject_pixel_to_point(depth_intrinsic, [x, y], dis)
            print("目标物三维坐标(x,y,z)：", str(camera_coordinate))
			
			# 画面显示
            cv2.imshow('RGB image', images)
            key = cv2.waitKey(1)
            # 按q或者esc退出
            if key & 0xFF == ord('q') or key == 27:
                break
        cv2.destroyAllWindows()
    finally:
        pipeline.stop()


if __name__ == '__main__':
    get_target_coordinate()

模型的训练大家可以搜一下yolov5训练自己的模型，这个文章下一期更新一个web版的
完整代码链接：通过网盘分享的文件：Yolo+realsense实时获取目标物三维坐标代码
链接: https://pan.baidu.com/s/1onxsdzVtv4G0-RHuyub-HQ?pwd=1112 提取码: 1112