openvino系列 6. 单目深度估算，输入为图片

最新推荐文章于 2024-08-12 21:10:50 发布

破浪会有时

最新推荐文章于 2024-08-12 21:10:50 发布

阅读量828

点赞数 2

分类专栏： openvino案例分析文章标签： openvino 机器学习

本文链接：https://blog.csdn.net/zyctimes/article/details/124450862

版权

openvino案例分析专栏收录该内容

20 篇文章 20 订阅

订阅专栏

本文介绍了如何利用OpenVINO和MidasNet模型进行单目深度估算。通过在Win10环境下，使用VSCode和OpenVINO 2022.1版本，对图片进行预处理并输入模型进行推理，得到深度估计结果。MidasNet是一种深度学习模型，能有效进行跨数据集的深度估计，提高了模型的泛化能力。最终，将结果转换为彩色图像并进行可视化展示。

摘要由CSDN通过智能技术生成

openvino系列 6. 单目深度估算，输入为图片

本案例演示在 OpenVINO 中使用 MidasNet 进行单目深度估计，输入图片情况。模型信息可以在这里找到。

在这里插入图片描述

环境描述：

本案例运行环境：Win10
IDE：VSCode
openvino版本：2022.1
代码链接，3-monodepth-imaging

文章目录

openvino系列 6. 单目深度估算，输入为图片

单目深度估算的基本概念

深度估计就是从RGB图像中估计图像中物体的深度，是一个从二维到三维的艰难过程。说道测距，我们首先会想到使用双目摄像头或者激光雷达，当然，这些方法各有优缺点，比如比如体积大（TOF）、能耗高（Kinect配有散热系统）、受环境影响（阳光中红外线影响）、算法复杂度高、实时性差（TOF实时性最高但精度较低）等。对于单目深度估算，其先天缺陷就是无法通过传感器直接得到精确的距离信息，但是随着软件算法的发展，我们可以通过深度学习来弥补硬件上的不足，同时为其他图像应用如语义分割、物体识别等提供更多的特征信息。

我们知道，就算我们闭上一只眼，也可以对眼前物体的距离有一个判断。那也就是说，我们可以通过深度学习，希望机器能拥有像人脑一样的学习能力，2D图像的距离信息有一个估算。

MidasNet的基本介绍

在这个演示中，我们使用了一个名为MiDaS 的神经网络模型。论文出处：

R. Ranftl, K. Lasinger, D. Hafner, K. Schindler and V. Koltun, “Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3019967.

这篇文章提出了一种监督的深度估计方法，具体来讲文章的策略可以归纳为：
1）使用多个深度数据集（各自拥有不同的scale和shift属性）加入进行训练，增大数据量与实现场景的互补；
2）提出了一种深度和偏移不变性的损失函数用于去监督深度的回归过程，从而使得可以更加有效使用现有数据；
3）采用从3D电影中进行采样的方式扩充数据集，从而进一步增加数据量；
4）使用带有原则属性的多目标训练方法，从而得到一种更加行之有效的优化方法；
结合上述的优化策略与方法，文章的最后得到的模型具有较强的泛化能力，从而摆脱了之前一些公开数据集场景依赖严重的问题。

单目深度估算在图像中的应用

代码整体逻辑：

首先，我们需要读取模型（ie.read_model）并且编译（ie.compile_model）；
第二步，我们读取图片，并且reshape其大小以符合模型的输入（输入图像用 OpenCV 读取，调整为网络输入大小，并reshape为 (N,C,H,W)（N=图像数，C=通道数，H=高度，W=宽度））；
第三步，模型推理（compiled_model([input_image])[output_key]）。得到的结果的尺寸和模型的输出尺寸相符。然后，我们将输出的结果转化为RGB图（通过函数convert_result_to_image），将其尺寸转换回输入是的图像大小，最后可视化结果。

代码如下：

import sys
import time
from pathlib import Path

import cv2
import matplotlib.cm
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import (
    HTML,
    FileLink,
    Pretty,
    ProgressBar,
    Video,
    clear_output,
    display,
)
from openvino.runtime import Core

DEVICE = "CPU"
MODEL_FILE = "model/MiDaS_small.xml"

model_xml_path = Path(MODEL_FILE)

def normalize_minmax(data):
    """
    Normalizes the values in `data` between 0 and 1
    """
    return (data - data.min()) / (data.max() - data.min())


def convert_result_to_image(result, colormap="viridis"):
    """
    Convert network result of floating point numbers to an RGB image with
    integer values from 0-255 by applying a colormap.

    `result` is expected to be a single network result in 1,H,W shape
    `colormap` is a matplotlib colormap.
    See https://matplotlib.org/stable/tutorials/colors/colormaps.html
    """
    cmap = matplotlib.cm.get_cmap(colormap)
    result = result.squeeze(0)
    result = normalize_minmax(result)
    result = cmap(result)[:, :, :3] * 255
    result = result.astype(np.uint8)
    return result


def to_rgb(image_data) -> np.ndarray:
    """
    Convert image_data from BGR to RGB
    """
    return cv2.cvtColor(image_data, cv2.COLOR_BGR2RGB)

print("1 - Load Model")
ie = Core()
model = ie.read_model(model=model_xml_path, weights=model_xml_path.with_suffix(".bin"))
compiled_model = ie.compile_model(model=model, device_name=DEVICE)
input_key = compiled_model.input(0)
output_key = compiled_model.output(0)
print("- Input layer info: {}".format(input_key))
print("- Output layer info: {}".format(output_key))
network_input_shape = list(input_key.shape)
network_image_height, network_image_width = network_input_shape[2:]
print("2 - Load Image")
IMAGE_FILE = "data/coco_bike.jpg"
image = cv2.imread(IMAGE_FILE)
print("- Input image size: {}".format(image.shape))
# resize to input shape for network
resized_image = cv2.resize(src=image, dsize=(network_image_height, network_image_width))
# reshape image to network input shape NCHW
input_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0)
print("- Image resize into: {}".format(input_image.shape))
print("3 - Model Inference")
result = compiled_model([input_image])[output_key]
print("- Inference result shape: {}".format(result.shape))
print("- convert network result of disparity map to an image that shows distance as colors.")
result_image = convert_result_to_image(result)
# resize back to original image shape. cv2.resize expects shape
# in (width, height), [::-1] reverses the (height, width) shape to match this
result_image = cv2.resize(result_image, image.shape[:2][::-1])
print("- resize back to original image shape from (width, height) to (height, width) based on cv2.resize requirement with final image shape {}".format(result_image.shape))
print("- final results visualization.")
fig, ax = plt.subplots(1, 2, figsize=(20, 15))
ax[0].imshow(to_rgb(image))
ax[1].imshow(result_image)

Terminal输出：

1 - Load Model
- Input layer info: <ConstOutput: names[input.1] shape{1,3,256,256} type: f32>
- Output layer info: <ConstOutput: names[1349] shape{1,256,256} type: f32>
2 - Load Image
- Input image size: (600, 800, 3)
- Image resize into: (1, 3, 256, 256)
3 - Model Inference
- Inference result shape: (1, 256, 256)
- convert network result of disparity map to an image that shows distance as colors.
- resize back to original image shape from (width, height) to (height, width) based on cv2.resize requirement with final image shape (600, 800, 3)
- final results visualization.

在这里插入图片描述