SuperGluePretrainedNetwork 详细解读

RobotsRuning

已于 2024-03-26 19:11:34 修改

阅读量788

点赞数 31

文章标签： python

于 2024-03-25 22:54:25 首次发布

本文链接：https://blog.csdn.net/u014374826/article/details/137024200

版权

本文详细解读了SuperGlue预训练网络在图像配对中的应用，包括脚本功能、使用场景和模型结构。主要内容涉及图像处理、特征匹配、可视化和交互式控制，适用于计算机视觉研究和相关领域。

摘要由CSDN通过智能技术生成

目录结构展示了SuperGluePretrainedNetwork项目的简化版布局。这是一个关于使用SuperGlue算法进行图像配对的深度学习项目，主要包括预训练的模型和执行配对的脚本。

`demo_superglue.py`

demo_superglue.py脚本的主要作用是展示SuperGlue预训练网络在图像对上进行特征匹配的能力。通过接收实时摄像头输入、视频文件或图像目录作为输入，该脚本能够实时地检测和匹配图像中的特征点，并可视化匹配结果。它是一个交互式的演示，允许用户通过键盘控制来调整匹配参数，同时展示关键点和匹配过程。

总结脚本功能：

输入处理：脚本接受多种形式的输入，包括USB摄像头、IP摄像头、图像目录或视频文件，支持通过命令行参数指定。

参数配置：用户可以通过命令行参数自定义多个匹配相关的配置，如关键点检测阈值、非极大值抑制（NMS）半径、Sinkhorn算法迭代次数、匹配阈值等。这些参数影响算法检测和匹配特征点的行为。

特征匹配：使用SuperPoint模型检测关键点和描述符，并通过SuperGlue模型进行特征点匹配。如果输入为视频或图像序列，脚本将连续匹配帧之间的特征点。

可视化：实时展示关键点检测和匹配结果，匹配的关键点会以连线形式显示。用户可以选择是否显示关键点。

交互式控制：提供键盘快捷键，允许用户实时调整关键点和匹配阈值，选择当前帧作为参考帧，以及开启或关闭关键点的可视化等。

输出：如果指定了输出目录，匹配结果将以图像形式保存。这些图像包含了原始图像、检测到的关键点和匹配的连线。

使用场景：

这个脚本非常适用于演示和评估SuperGlue算法在不同场景和条件下的特征匹配性能。它可以用于计算机视觉研究、机器人导航、增强现实应用等领域，为开发者和研究人员提供了一个便捷的工具来理解和利用SuperGlue模型的强大功能。

#! /usr/bin/env python3
# 这是一个 Python 脚本,声明使用 Python 3 作为解释器。

# 导入所需的模块和库。pathlib 用于处理文件路径,
# argparse 用于解析命令行参数,cv2 是 OpenCV 库用于图像处理,
# matplotlib.cm 用于获取颜色映射,torch 是 PyTorch 深度学习库。
from pathlib import Path
import argparse
import cv2
import matplotlib.cm as cm
import torch

# 从本地 models 包中导入 Matching 类和一些实用函数。具体来说:
# Matching 是一个类,用于将 SuperPoint 和 SuperGlue 模型组合在一起进行图像匹配。
# AverageTimer 是一个用于计时和统计平均时间的实用程序类。
# VideoStreamer 是一个用于从各种源读取视频帧或图像的实用程序类。
# make_matching_plot_fast 是一个函数,用于快速生成显示匹配结果的图像。
# frame2tensor 是一个函数,用于将图像帧转换为 PyTorch 张量。

from models.matching import Matching
from models.utils import (AverageTimer, VideoStreamer,
                          make_matching_plot_fast, frame2tensor)
# 禁用 PyTorch 中的自动求导,因为这是一个推理(inference)过程,不需要计算梯度。
torch.set_grad_enabled(False)


if __name__ == '__main__':
    #创建一个 ArgumentParser 对象,用于解析命令行参数。
    # description 描述了该程序的功能,formatter_class 指定了如何格式化帮助信息。
    parser = argparse.ArgumentParser(
        description='SuperGlue demo',
        formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    # 添加一个命令行参数 --input。
    # 它指定输入源,可以是 USB 摄像头 ID、IP 摄像头 URL、图像目录或视频文件路径。
    # 默认值为 '0',表示使用默认的 USB 摄像头。
    parser.add_argument(
        '--input', type=str, default='0',
        help='ID of a USB webcam, URL of an IP camera, '
             'or path to an image directory or movie file')
    # 添加一个命令行参数 --output_dir。它指定输出目录,如果为 None(默认值)则不输出任何帧。
    parser.add_argument(
        '--output_dir', type=str, default=None,
        help='Directory where to write output frames (If None, no output)')
    # 添加一个命令行参数 --image_glob。它指定图像文件扩展名类型,如果输入是图像目录。
    # 默认值为 ['*.png', '*.jpg', '*.jpeg'],表示将读取 PNG、JPG 和 JPEG 格式的图像文件。
    parser.add_argument(
        '--image_glob', type=str, nargs='+', default=['*.png', '*.jpg', '*.jpeg'],
        help='Glob if a directory of images is specified')
    # 添加一个命令行参数 --skip。它指定跳过的帧数或图像数,如果输入是视频或图像目录。默认值为 1,表示不跳过任何帧或图像。
    parser.add_argument(
        '--skip', type=int, default=1,
        help='Images to skip if input is a movie or directory')
    # 添加一个命令行参数 --max_length。它指定最大长度,如果输入是视频或图像目录。默认值为 1000000,表示读取所有帧或图像。
    parser.add_argument(
        '--max_length', type=int, default=1000000,
        help='Maximum length if input is a movie or directory')
    # 添加一个命令行参数 --resize。它用于在运行推理前调整输入图像的大小。
    # 如果提供两个数字,则调整到指定的宽高;如果提供一个数字,则调整最大维度;
    # 如果为 -1,则不调整大小。默认值为 [640, 480],表示将图像调整为 640x480 的分辨率。
    parser.add_argument(
        '--resize', type=int, nargs='+', default=[640, 480],
        help='Resize the input image before running inference. If two numbers, '
             'resize to the exact dimensions, if one number, resize the max '
             'dimension, if -1, do not resize')
    # 添加一个命令行参数 --superglue。
    # 它指定使用 SuperGlue 算法的室内或室外预训练权重。可选值为 'indoor' 和 'outdoor',默认值为 'indoor'。
    parser.add_argument(
        '--superglue', choices={'indoor', 'outdoor'}, default='indoor',
        help='SuperGlue weights')
    # 添加一个命令行参数 --max_keypoints。它指定保留的最大关键点数量。
    # 如果设置为 -1(默认值),则保留所有检测到的关键点。
    parser.add_argument(
        '--max_keypoints', type=int, default=-1,
        help='Maximum number of keypoints detected by Superpoint'
             ' (\'-1\' keeps all keypoints)')
    # 添加一个命令行参数 --keypoint_threshold。它指定 SuperPoint 关键点检测器的置信度阈值。默认值为 0.005。
    parser.add_argument(
        '--keypoint_threshold', type=float, default=0.005,
        help='SuperPoint keypoint detector confidence threshold')
    # 添加一个命令行参数 --nms_radius。它指定 SuperPoint 算法中非最大值抑制(NMS)的半径。默认值为 4。注释说明该值必须为正数。
    parser.add_argument(
        '--nms_radius', type=int, default=4,
        help='SuperPoint Non Maximum Suppression (NMS) radius'
        ' (Must be positive)')
    # 添加一个命令行参数 --sinkhorn_iterations。它指定 SuperGlue 算法中 Sinkhorn 迭代的次数。默认值为 20。
    parser.add_argument(
        '--sinkhorn_iterations', type=int, default=20,
        help='Number of Sinkhorn iterations performed by SuperGlue')
    # 添加一个命令行参数 --match_threshold。它指定 SuperGlue 算法中匹配的阈值。默认值为 0.2。
    parser.add_argument(
        '--match_threshold', type=float, default=0.2,
        help='SuperGlue match threshold')
    # 添加一个命令行参数 --show_keypoints。它是一个布尔标志,用于指定是否在结果图像中显示检测到的关键点。如果设置该标志,则显示关键点。
    parser.add_argument(
        '--show_keypoints', action='store_true',
        help='Show the detected keypoints')
    # 添加一个命令行参数 --no_display。
    # 它是一个布尔标志,用于指定是否不显示任何图像。如果设置该标志,则不显示任何图像,这在远程运行时可能会很有用。
    parser.add_argument(
        '--no_display', action='store_true',
        help='Do not display images to screen. Useful if running remotely')
    # 添加一个命令行参数 --force_cpu。
    # 它是一个布尔标志,用于指定是否强制使用 CPU 模式运行推理,而不使用 GPU。如果设置该标志,则强制使用 CPU 模式。
    parser.add_argument(
        '--force_cpu', action='store_true',
        help='Force pytorch to run in CPU mode.')

    opt = parser.parse_args()
    # 解析命令行参数并打印出解析后的结果。
    print(opt)
    # 对上面的处理 --resize 参数进行处理,确保其格式正确（因为--resize不能输错）,并打印出将要执行的调整操作。
    if len(opt.resize) == 2 and opt.resize[1] == -1:
        opt.resize = opt.resize[0:1]
    if len(opt.resize) == 2:
        print('Will resize to {}x{} (WxH)'.format(
            opt.resize[0], opt.resize[1]))
    elif len(opt.resize) == 1 and opt.resize[0] > 0:
        print('Will resize max dimension to {}'.format(opt.resize[0]))
    elif len(opt.resize) == 1:
        print('Will not resize images')
    else:
        raise ValueError('Cannot specify more than two integers for --resize')
    # 根据 GPU 是否可用和是否强制使用 CPU 模式,确定运行推理的设备(CUDA 或 CPU)。然后打印出将在哪个设备上运行推理。
    device = 'cuda' if torch.cuda.is_available() and not opt.force_cpu else 'cpu'
    print('Running inference on device \"{}\"'.format(device))
    # 创建一个配置字典,其中包含了 SuperPoint 和 SuperGlue 算法的相关参数。这些参数的值来自于命令行参数。
    config = {
        'superpoint': {
            'nms_radius': opt.nms_radius,
            'keypoint_threshold': opt.keypoint_threshold,
            'max_keypoints': opt.max_keypoints
        },
        'superglue': {
            'weights': opt.superglue,
            'sinkhorn_iterations': opt.sinkhorn_iterations,
            'match_threshold': opt.match_threshold,
        }
    }
    # 实例化一个 Matching 对象,将其设置为评估模式,并移动到指定的设备上。同时定义了一个列表 keys,用于存储关键信息的键名。
    matching = Matching(config).eval().to(device)
    keys = ['keypoints', 'scores', 'descriptors']
    # 实例化一个 VideoStreamer 对象,用于从指定的输入源读取帧或图像。
    # 读取第一帧,并确保读取成功。如果读取失败,则打印一条错误消息,提示尝试使用不同的 --input 参数。
    vs = VideoStreamer(opt.input, opt.resize, opt.skip,
                       opt.image_glob, opt.max_length)
    frame, ret = vs.next_frame()
    assert ret, 'Error when reading the first frame (try different --input?)'
    # 将第一帧转换为张量,并使用 SuperPoint 模型进行处理。
    # 将结果存储在 last_data 字典中,同时添加一些其他必需的键值对。保存第一帧的图像和帧 ID。
    frame_tensor = frame2tensor(frame, device)
    last_data = matching.superpoint({'image': frame_tensor}