山东大学软件学院项目实训进展记录2

夏瑾流年

已于 2024-05-30 13:04:16 修改

阅读量165

点赞数 3

文章标签： java 深度学习 python

于 2024-05-29 20:19:27 首次发布

本文链接：https://blog.csdn.net/kalilyee/article/details/139304915

版权

日期：2024.4.8-4.14

本周进展

1. FastSAM框分割

本周尝试添加框分割的功能。首先在本地选取图片进行尝试，相关代码示例如下：

import matplotlib.pyplot as plt
from fastsam import FastSAM, FastSAMPrompt
import torch
from PIL import Image
from utils.tools import convert_box_xywh_to_xyxy

model_pth = "./weights/FastSAM-x.pt"
img_path = "./images/demo_img.jpg"
box_prompt = [[14.7,346.5,94.1,81.8]]
box_prompt = convert_box_xywh_to_xyxy(box_prompt)
model = FastSAM(model_path)
input = Image.open(img_path)
input = input.convert("RGB")
everything_results = model(
        input,
        device=args.device,
        retina_masks=args.retina,
        imgsz=args.imgsz,
        conf=args.conf,
        iou=args.iou    
        )
prompt_process = FastSAMPrompt(input, everything_results, device=args.device)
ann = prompt_process.box_prompt(bboxes=box_prompt)
num = ann.shape[0]
for i in range(num):
    plt.figure()
    plt.axis('off')
    plt.xticks([])
    plt.yticks([])
    an = ann[i]
    plt.imshow(an)
    plt.savefig(f'output/res{i}.png',bbox_inches='tight',pad_inches=0.0)
    plt.show()
    plt.close()

此仅为部分代码，FastSAM模型定义、FastSAMPrompt、prompt_process.box_prompt()、convert_box_xywh_to_xyxy()等函数不在此具体给出，详见FastSAM官方源码。

原图：

bounding box：

分割结果：

此处因尚未与前端项目对接，bounding box坐标暂时由另一款在线的图像标注工具LabelImg来提供，附上其链接。

2. 文本分割功能

文本分割，即指代图片物体分割，这是我两年科研工作一直在进行的任务，因而对该方向的模型非常熟悉。为实现该功能，我考虑了LAVT、CRIS、CGFormer、PolyFormer以及其他模型，详细资源罗列如下：

LAVT	CRIS	CGFormer	PolyFormer
论文	论文	论文	论文
代码	代码	代码	代码

在评估上述模型过程中，我们发现FastSAM也提供了文本分割功能，且相比于上述模型拥有更快的推理速度和更小的算力依赖，其通过借助CLIP模型来实现（CLIP是一个经典的跨模态模型backbone，附上其代码）。经过组内讨论，我们决定暂时采用FastSAM来实现文本分割功能。

示例代码如下：

import matplotlib.pyplot as plt
from fastsam import FastSAM, FastSAMPrompt
import torch
from PIL import Image

model_pth = "./weights/FastSAM-x.pt"
img_path = "./images/demo_img.jpg"
text_prompt = "a white car on the right"
model = FastSAM(model_path)
input = Image.open(img_path)
input = input.convert("RGB")
everything_results = model(
        input,
        device=args.device,
        retina_masks=args.retina,
        imgsz=args.imgsz,
        conf=args.conf,
        iou=args.iou    
        )
prompt_process = FastSAMPrompt(input, everything_results, device=args.device)
ann = prompt_process.text_prompt(text=text_prompt)
num = ann.shape[0]
for i in range(num):
    plt.figure()
    plt.axis('off')
    plt.xticks([])
    plt.yticks([])
    an = ann[i]
    plt.imshow(an)
    plt.savefig(f'output/res{i}.png',bbox_inches='tight',pad_inches=0.0)
    plt.show()
    plt.close()

原图同上，文本：a white car on the right。

分割结果：