sam_vit_h_4b8939模型说明

CCSBRIDGE

于 2025-04-09 13:49:26 发布

阅读量637

点赞数 5

分类专栏： Stable Diffusion ComfyUI 文章标签：人工智能

本文链接：https://blog.csdn.net/weixin_47420447/article/details/147092149

版权

Stable Diffusion ComfyUI 专栏收录该内容

14 篇文章

订阅专栏

📘 模型说明卡：`sam_vit_h_4b8939.pth`

模型名称：Segment Anything Model - ViT-Huge
文件名：sam_vit_h_4b8939.pth
发布者：Meta AI
发布平台：Hugging Face
许可证：Apache-2.0

🧠 简介

sam_vit_h_4b8939.pth 是 Meta AI 发布的 Segment Anything Model (SAM) 的最强版本，采用 ViT-H（Vision Transformer Huge）作为图像编码器，具备极高的零样本分割能力。

该模型支持基于提示的交互式分割：你可以提供“点、框、掩码”作为提示，它会高质量地生成目标区域的掩码。它也可在无需提示的自动模式下生成整张图的所有对象掩码。

🏗️ 模型结构组成

SAM 包括以下四个模块：

VisionEncoder（ViT-H）：主干特征提取器，对整张图像进行 patch 编码。
PromptEncoder：处理用户输入的提示（如点、框等）。
MaskDecoder（双向 Transformer）：将提示和图像信息交互融合，生成多层 mask。
Neck：对 mask decoder 的输出进行进一步处理，输出最终掩码图。

📦 模型下载链接（使用 HuggingFace 镜像）：

wget -O sam_vit_h_4b8939.pth "https://hf-mirror.com/HCMUE-Research/SAM-vit-h/resolve/main/sam_vit_h_4b8939.pth?download=true"

📊 模型参数

项目	值
模型类型	Promptable Image Segmentation
主干网络	ViT-Huge
参数数量	641M
权重格式	PyTorch `.pth`
数据集	SA-1B：1.1B Masks，11M 图像（全部人工审查 / SFW）
训练任务	Prompt → Mask 映射，支持点、框、掩码作为 prompt

🚀 使用方式（HuggingFace Transformers 示例）

from PIL import Image
import requests
from transformers import SamModel, SamProcessor

model = SamModel.from_pretrained("facebook/sam-vit-huge").to("cuda")
processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")

input_points = [[[450, 600]]]  # 例如：目标窗口的一个点

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)

masks = processor.image_processor.post_process_masks(
    outputs.pred_masks.cpu(),
    inputs["original_sizes"].cpu(),
    inputs["reshaped_input_sizes"].cpu()
)

🧪 自动掩码生成示例（Zero-shot 模式）

from transformers import pipeline
generator = pipeline("mask-generation", model="facebook/sam-vit-huge", device=0)

outputs = generator("your_image_path.jpg", points_per_batch=256)

🎯 应用场景

高精度目标分割（医学图像、工业检测）
图像标注工具（配合 Gradio / ComfyUI 等）
自然图像中全图对象分割（无需标注）
与 LoRA、ControlNet 等组合进行提示式增强

📝 模型引用（BibTeX）

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

📁 推荐存放路径（ComfyUI 用户）

/ComfyUI/models/sam/sam_vit_h_4b8939.pth