GLIP代码调试与效果分析

最新推荐文章于 2024-05-06 18:50:48 发布

弈秋001

最新推荐文章于 2024-05-06 18:50:48 发布

阅读量1.6k

点赞数 25

文章标签：人工智能 transformer 深度学习目标检测计算机视觉

本文链接：https://blog.csdn.net/weixin_42479327/article/details/136548874

版权

关于理论部分, 已有很多篇博客做了详细解读, 这里就不写了. 仅有代码调试.
官方给的代码有坑,
要运行代码,需要编译,官方给的代码能在cuda10与pytorch1.1x上编译, 如果你是cuda11/12, 或者更高的torch版本, 一定会出错

源码:
https://github.com/microsoft/GLIP

我已经填平各种坑, 不想努力的同学可以直接用我修改后的:
https://github.com/yblir/GLIP_detection
用法: 编译成功后, 直接运行根目录下的glip_predict.py

一编译脚本

在这里插入图片描述

编译成功后,如果还有ImportError: cannot import name ‘_C’ from 'maskrcnn_benchmark’错误, 将_C.cp38xxx,移动到maskrcnn_bechmark根目录下. 如果没有编译就执行运行代码, 也会有这个错误, 原理一样,都是没有_C.cp38xxx 文件.

经过验证, win10/linux都可以成功编译.
在这里插入图片描述

编译中遇到的大多数问题都可通过这篇博客解决, 如果不想努力, 可以使用我提供的代码, 填平了所有坑, 但未修改任何主体代码:
https://blog.csdn.net/code_zhao/article/details/129172817

需要注意该博文中这段内容:
按我给的红字提示修改: dim3 grid(std::min::ceil_dic(int(**),512),4096));
在这里插入图片描述

二代码调试

2.1 报错, pytorch高版本造成, 但在当前代码无用,可注释掉

AttributeError: module 'torch' has no attribute '_six'

maskrcnn_benchmark/utils/imports.py

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
import torch

# if torch._six.PY37:
#     import importlib
#     import importlib.util
#     import sys
#
#
#     # from https://stackoverflow.com/questions/67631/how-to-import-a-module-given-the-full-path?utm_med
#     ium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
#     def import_file(module_name, file_path, make_importable=False):
#         spec = importlib.util.spec_from_file_location(module_name, file_path)
#         module = importlib.util.module_from_spec(spec)
#         spec.loader.exec_module(module)
#         if make_importable:
#             sys.modules[module_name] = module
#         return module
# else:
import imp


def import_file(module_name, file_path, make_importable=None):
    module = imp.load_source(module_name, file_path)
    return module

2.2 修改函数命名

ImportError: cannot import name '_download_url_to_file' from 'torch.utils.model_zoo'

# 把第6行的_download_url_to_file 的前下划线去掉
#from torch.hub import _download_url_to_file
from torch.hub import download_url_to_file

2.3 模型手动下载

OSError: Can't load config for 'bert-base-uncased'. 
If you were trying to load it from 'https://huggingface.co/models', make sure you don't have 
a local directory with the same name. Otherwise, make sure 'bert-base-uncased' is the correct 
path to a directory containing a config.json file

在这里插入图片描述现在是从本地加载bert模型, 在项目根目录下建立bert_base_uncased(必须是这个名字)文件夹. 因为代码通过from_pretrained(‘bert_base_uncased’)从网上下载模型, 建立本地同名文件夹相当于覆写了下载路径. 否则要改很多配置文件才能达到同样效果.
在这里插入图片描述

2.4 nltk_data: 600M+的包, 有时联网也下载不到, 手动下载后修改加载路径

xml.etree.ElementTree.ParseError: unclosed token: line 472, column 6

从这个地方下载
https://github.com/nltk/nltk_data/tree/gh-pages

放在自己喜欢的位置, 记得把tokenizers中punkt手动解压缩, 然后把添加搜索路径, 再把下面两行download注释掉

maskrcnn_benchmark/engine/predictor_glip.py
在这里插入图片描述

这里nltk_data存放的下载后的packages中文件
在这里插入图片描述

2.5 numpy版本问题, 高版本不再支持np.float, 在代码中全部改为np.float32

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`.

在这里插入图片描述

2.6 正常警告, 不必管它

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'bert.pooler.dense.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'bert.pooler.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

使用以下日志等级可屏蔽掉

from transformers import logging 
logging.set_verbosity_error()

三预测代码

import warnings

warnings.filterwarnings("ignore")
from transformers import logging

logging.set_verbosity_error()
# pylab.rcParams['figure.figsize'] = 20, 12
from maskrcnn_benchmark.config import cfg
from maskrcnn_benchmark.engine.predictor_glip import GLIPDemo

import cv2
import numpy as np
import torch
from PIL import Image, ImageDraw, ImageFont


class Colors:
    # Ultralytics color palette https://ultralytics.com/
    def __init__(self):
        hexs = (
            "FF3838", "FF9D97", "FF701F", "FFB21D", "CFD231", "48F90A", "92CC17", "3DDB86", "1A9334", "00D4BB",
            "2C99A8", "00C2FF", "344593", "6473FF", "0018EC", "8438FF", "520085", "CB38FF", "FF95C8", "FF37C7",
        )
        self.palette = [self.hex2rgb(f"#{c}") for c in hexs]
        self.n = len(self.palette)

    def __call__(self, i, bgr=False):
        """Returns color from palette by index `i`, in BGR format if `bgr=True`, else RGB; `i` is an integer index."""
        c = self.palette[int(i) % self.n]
        return (c[2], c[1], c[0]) if bgr else c

    @staticmethod
    def hex2rgb(h):
        """Converts hexadecimal color `h` to an RGB tuple (PIL-compatible) with order (R, G, B)."""
        return tuple(int(h[1 + i: 1 + i + 2], 16) for i in (0, 2, 4))


def draw_images(image, boxes, classes, scores, colors, xyxy=True):
    if isinstance(image, np.ndarray):
        image = Image.fromarray(image[:, :, ::-1])
    if isinstance(boxes, torch.Tensor):
        boxes = boxes.cpu().numpy()

    # 设置字体,pillow 绘图环节
    font = ImageFont.truetype(font='configs/simhei.ttf',size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
    # 多次画框的次数,根据图片尺寸不同,把框画粗
    thickness = max((image.size[0] + image.size[1]) // 300, 1)
    draw = ImageDraw.Draw(image)

    for i, box in enumerate(boxes):
        x1, y1, x2, y2 = box
        color = colors[i]

        label = '{}:{:.2f}'.format(classes[i], scores[i])
        tx1, ty1, tx2, ty2 = font.getbbox(label)
        tw, th = tx2 - tx1, ty2 - tx1

        text_origin = np.array([x1, y1 - th]) if y1 - th >= 0 else np.array([x1, y1 + 1])

        # 在目标框周围偏移几个像素多画几次, 让边框变粗
        for j in range(thickness):
            draw.rectangle((x1 + j, y1 + j, x2 - j, y2 - j), outline=color)
        # 画标签
        draw.rectangle((text_origin[0], text_origin[1], text_origin[0] + tw, text_origin[1] + th), fill=color)
        draw.text(text_origin, label, fill=(0, 0, 0), font=font)

    return image


config_file = "configs/pretrain/glip_Swin_T_O365_GoldG.yaml"
weight_file = r'E:\PyCharm\PreTrainModel\glip_tiny_model_o365_goldg_cc_sbu.pth'

cfg.local_rank = 0
cfg.num_gpus = 1
cfg.merge_from_file(config_file)
cfg.merge_from_list(["MODEL.WEIGHT", weight_file])
cfg.merge_from_list(["MODEL.DEVICE", "cuda"])

glip_demo = GLIPDemo(
    cfg,
    min_image_size=800,
    confidence_threshold=0.7,
    show_mask_heatmaps=False
)

def glip_inference(image_, caption_):
    # 为不同类别设置颜色, 从caption提取的类别不同
    colors_ = Colors()

    preds = glip_demo.compute_prediction(image_, caption_)
    top_preds = glip_demo._post_process(preds, threshold=0.5)

    # 从预测结果中提取预测类别,得分和检测框
    labels = top_preds.get_field("labels").tolist()
    scores = top_preds.get_field("scores").tolist()
    boxes = top_preds.bbox.detach().cpu().numpy()

    # 为每个预测类别设置框颜色
    colors = [colors_(idx) for idx in labels]
    # 获得标签数字对应的类别名
    labels_names = glip_demo.get_label_names(labels)

    return boxes, scores, labels_names, colors


if __name__ == '__main__':
    # caption = 'bobble heads on top of the shelf'
    # caption = "Striped bed, white sofa, TV, carpet, person"
    # caption = "table on carpet"
    caption = "Table, TV"

    image = cv2.imread('docs/demo.jpg')
    boxes, scores, labels_names, colors = glip_inference(image, caption)

    print(labels_names, scores)
    print(boxes)

    image = draw_images(image=image, boxes=boxes, classes=labels_names, scores=scores, colors=colors)
    image.show()

四效果分析

根据不同提示词会获得很多神奇的结果
在这里插入图片描述 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

五总结

如果图中没有提示词物品, 可能会强行安利一个
不能处理太长的句子, 会超出理解范围
提示词很重要, 检测不到的物品, 可能换个表达就能检测出来了.
作为开集检测模型, GLIP还是非常优秀的, 比如零样本检测. 还可以实现视频检测任务. 快速自动标注等,至于检测精度, 还不能与有监督算法相比.
对于工业部署, zero-shot还是算了, 薛定谔的检测结果不太靠谱. 微调后应该会好很多.
公司曾部署过基于clip的视觉大模型, base版本运行性能就很吃力了. glip要实现落地也会面临同样的问题.

弈秋001

关注

25
点赞
踩
34

收藏

觉得还不错? 一键收藏
11
评论
GLIP代码调试与效果分析

如果图中没有提示词物品, 可能会强行安利一个不能处理太长的句子, 会超出理解范围提示词很重要, 检测不到的物品, 可能换个表达就能检测出来了.作为开集检测模型, GLIP还是非常优秀的, 比如零样本检测. 还可以实现视频检测任务. 快速自动标注等,至于检测精度, 还不能与有监督算法相比.对于工业部署, zero-shot还是算了, 薛定谔的检测结果不太靠谱. 微调后应该会好很多.
复制链接

扫一扫