人工智能深度估计技术（中文翻译版）-CSDN博客

版权声明：本文翻译自CSDN博主「沾花把玖」的原创文章，已遵循CC 4.0 BY-SA版权协议并获得对方同意，转载请附上两篇原文出处链接及本声明。
原文链接：人工智能深度估计技术_沾花把玖的博客-CSDN博客

人工智障（能）走起！！！

在Hugging Face中找到Depth Estimation的model，如下图：

Hugging Face——人工智能社区建设未来。

（上Hugging Face要翻墙！你翻不翻我不管。。。）

1.单目深度估计Monocular depth estimation

单目深度估计是一项计算机视觉任务，涉及从单个图像预测场景的深度信息。换句话说，它是从单个摄像机视点估计场景中物体距离的过程。

单目深度估计有多种应用，包括 3D 重建、增强现实、自动驾驶和机器人技术。这是一项具有挑战性的任务，因为它要求模型理解场景中物体之间的复杂关系以及相应的深度信息，这些关系可能受到光照条件、遮挡和纹理等因素的影响。

本教程中演示的任务由以下模型架构支持：

DPT, GLPN（数据、传输协议）

在本指南中，您将学习如何：

创建（depth estimation pipeline）
手动运行深度估计推理（depth estimation inference by hand）

在开始之前，请确保已安装所有必需的库：

pip install -q transformers

2.深度估计管道Depth estimation pipeline

尝试使用支持深度估计的模型进行推理的最简单方法是使用相应的 pipeline()。从 checkpoint on the Hugging Face Hub上实例化管道：

from transformers import pipeline
 
checkpoint = "vinvino02/glpn-nyu"
depth_estimator = pipeline("depth-estimation", model=checkpoint)

接下来，选择要分析的图像：

from PIL import Image
import requests
 
url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640"
image = Image.open(requests.get(url, stream=True).raw)
image

繁忙街道的照片

将图像传递到pipeline。

predictions = depth_estimator(image)

该管道返回一个包含两个条目的字典。第一个称为 predicted_depth，是一个张量，其值是每个像素以米为单位表示的深度;第二个，depth，是一个 PIL 图像，可视化深度估计结果。

我们看一下可视化结果：

predictions["depth"]

深度估计可视化

3.手动深度估计推断Depth estimation inference by hand

现在您已经了解了如何使用Depth estimation pipeline，让我们看看如何手动复制相同的结果。

首先从checkpoint on the Hugging Face Hub加载模型和关联的处理器。这里我们将使用与之前相同的checkpoint：

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
 
checkpoint = "vinvino02/glpn-nyu"
 
image_processor = AutoImageProcessor.from_pretrained(checkpoint)
model = AutoModelForDepthEstimation.from_pretrained(checkpoint)

使用准备模型的图像输入,image_processor它将处理必要的图像转换，例如调整大小和标准化：

pixel_values = image_processor(image, return_tensors="pt").pixel_values

将准备好的输入传递给模型：

import torch
 
with torch.no_grad():
    outputs = model(pixel_values)
    predicted_depth = outputs.predicted_depth

可视化结果：

import numpy as np
 
# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
).squeeze()
output = prediction.numpy()
 
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)
depth

记得点赞和关注哦~