基于SAM模型的交互式图像分割实现

最新推荐文章于 2024-02-01 18:10:32 发布

老歌老听老掉牙

最新推荐文章于 2024-02-01 18:10:32 发布

阅读量781

点赞数 2

文章标签：图像处理 python

本文链接：https://blog.csdn.net/T20151470/article/details/132286141

版权

本文介绍了计算机视觉中的图像分割技术，特别是语义分割和实例分割在深度学习模型（如卷积神经网络）中的应用。文章展示了如何使用SegmentAnythingModel（如SamPredictor）进行实例分割，并通过实例展示了如何在图像上进行像素级的物体识别和分割。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

图像分割是计算机视觉中的一项重要任务，旨在将图像中的像素划分成不同的区域或对象，以实现对图像内容的精确掌握和分析。通过图像分割，我们可以将图像中的不同部分进行识别、分离和分类，为各种应用程序提供基础数据。

图像分割可以分为多个子领域，包括语义分割和实例分割。下面将对这两个子领域进行详细介绍：

1. 语义分割

语义分割旨在将图像中的每个像素分配给特定的类别，即给图像中的每个像素标注上类别标签。这意味着无论是属于同一个对象还是同一个类别的像素都应该被分到同一个区域。因此，语义分割可以实现对图像中不同物体的像素级别识别。

语义分割在许多计算机视觉任务和应用中发挥关键作用，如智能驾驶中的道路和障碍物识别、医学图像分析中的组织和病变分割等。常用的语义分割方法使用深度学习模型（如卷积神经网络）来学习图像的语义信息，并利用像素级别的标注数据进行训练。

2. 实例分割

实例分割不仅要对图像中每个像素进行分类，还需要区分不同对象之间的关系，即将图像中的每个像素与特定的对象实例相关联。这意味着图像中同一类别的不同实例应该被分配到不同的区域。因此，实例分割可以实现对图像中每个对象的像素级别识别和分割。

实例分割在许多领域中都具有重要应用，如目标检测、机器人导航、视频监控等。常用的实例分割方法基于深度学习技术，结合目标检测和语义分割的思想，既能识别出不同对象的类别，又能分割出每个对象的边界。

总结而言，语义分割和实例分割都属于图像分割的子领域，它们在图像理解和计算机视觉任务中发挥着重要作用。语义分割关注于将图像中的每个像素分配给特定的类别，而实例分割则进一步识别和分割出每个对象的像素。这些图像分割技术为计算机视觉和人工智能应用提供了强大的基础。

下面是基于分割一切模型（Segment Anything Model，SAM）的一个应用：

import numpy as np
import torch
import matplotlib.pyplot as plt
import cv2
import sys
sys.path.append("..")
from segment_anything import sam_model_registry, SamPredictor
#show mask
def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)
#show tips point
def show_points(coords, labels, ax, marker_size=375):
    pos_points = coords[labels == 1]
    neg_points = coords[labels == 0]
    ax.scatter(pos_points[:, 0], pos_points[:, 1], color='green', marker='*', s=marker_size, edgecolor='white',
               linewidth=1.25)
    ax.scatter(neg_points[:, 0], neg_points[:, 1], color='red', marker='*', s=marker_size, edgecolor='white',
               linewidth=1.25)

#show tips box
def show_box(box, ax):
    x0, y0 = box[0], box[1]
    w, h = box[2] - box[0], box[3] - box[1]
    ax.add_patch(plt.Rectangle((x0, y0), w, h, edgecolor='green', facecolor=(0, 0, 0, 0), lw=2))
#read picture
# image = cv2.imread(r'C:\Users\user\Pictures\0H\1-1.jpg')
# image = cv2.imread(r'C:\Users\user\Pictures\0.50LC\1-1.jpg')
image = cv2.imread('girl.jpeg')
# #bgr->rgb
# image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# #show origin picture
# plt.figure(figsize=(10,10))
# plt.imshow(image)
# plt.axis('on')
# plt.show()

#path for sam
sam_checkpoint = r"D:\BaiduNetdiskDownload\segment-anything-main2\segment-anything-main\models\sam_vit_b_01ec64.pth"
model_type = "vit_b"

device = "cpu"  # or  "cuda"

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)

predictor = SamPredictor(sam)
predictor.set_image(image)

pos=[]
def on_EVENT_LBUTTONDOWN(event, x, y, flags, param):
    if event == cv2.EVENT_LBUTTONDOWN:
        xy = "%d,%d" % (x, y)
        cv2.circle(image, (x, y), 1, (255, 0, 0), thickness = -1)
        cv2.putText(image, xy, (x, y), cv2.FONT_HERSHEY_PLAIN,
                    1.0, (0,0,0), thickness = 1)
        cv2.imshow("image", image)
        pos.append([x,y])

cv2.namedWindow("image",0)
cv2.setMouseCallback("image", on_EVENT_LBUTTONDOWN)
while(1):
    cv2.imshow("image", image)
    if cv2.waitKey(0)&0xFF==27:
        break
cv2.destroyAllWindows()
#set tips point and its label
input_point = np.array([[pos[0][0], pos[0][1]]])  # 标记点
input_label = np.array([1])  # 点所对应的标签
# #show tips point on image
# plt.figure(figsize=(10,10))
# plt.imshow(image)
# show_points(input_point, input_label, plt.gca())
# plt.axis('on')
# plt.show()
#
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True,
)
for i, (mask, score) in enumerate(zip(masks, scores)):
    a=mask.reshape(-1,)
    b=np.where(a==True)
    area=np.shape(b)[1]
    plt.figure(figsize=(10,10))
    plt.imshow(image)
    show_mask(mask, plt.gca())
    show_points(input_point, input_label, plt.gca())
    plt.title(f"Mask {i+1}, Score: {score:.3f},Area: {area:.3f}", fontsize=18)
    plt.axis('off')
    plt.show()