BLIP2部署教程

CoderXiu

已于 2024-12-19 17:09:43 修改

阅读量1.6k

点赞数 7

文章标签： python 人工智能计算机视觉

于 2024-10-25 13:21:47 首次发布

本文链接：https://blog.csdn.net/shaxiu0213/article/details/143230106

版权

简单记录一下BLIP2部署的流程
主要遇到的问题还是有墙导致模型权重无法下载

环境安装

本文采用Lavis进行BLIP2的部署

1.pip 安装lavis，这里记得换一下清华源，下载会快一点

pip install salesforce-lavis

通过下方代码判断lavis库是否安装成功

from lavis.models import model_zoo
print(model_zoo)

ps：在这一步可能会出现RuntimeError: module compiled against ABI version 0x1000009 but this version of numpy is 0x2000000 Traceback (most recent call last): 报错
解决办法： 卸载numpy 卸载salesforce-lavis 卸载opencv-python；然后重新pip安装salesforce-lavis opencv-python即可

2.替换安装的库中权重文件路径
以Image Captioning任务为例

修改lavis/configs/models/blip_caption_base_coco.yaml
该文件在python lavis库中，如果是通过git安装的，则直接更改对应git 项目文件

model:
  arch: blip_caption
  load_finetuned: True

  pretrained: "https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth"
  finetuned: "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP/blip_coco_caption_base.pth"

  # vit encoder
  vit_type: "base"
  vit_grad_ckpt: False
  vit_ckpt_layer: 0
  image_size: 384

将pretrained 和 finetuned 中的权重文件下载到本地，然后将其替换成权重的绝对路径

运行报错 OSError: Can't load tokenizer for 'bert-base-uncased'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'bert-base-uncased' is the correct path to a dir

直接命令行通过huggingface镜像运行脚本，从而下载对应bert模型权重
HF_ENDPOINT=https://hf-mirror.com python 测试脚本.py

总体思想就是缺少哪个模型权重文件就下载哪个，然后替换成对应绝对路径

另附Image Captioning任务测试脚本

import torch
from lavis.models import load_model_and_preprocess
from PIL import Image

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# loads BLIP caption base model, with finetuned checkpoints on MSCOCO captioning dataset.
# this also loads the associated image processors
model, vis_processors, _ = load_model_and_preprocess(name="blip_caption", model_type="base_coco", is_eval=True, device=device)
# preprocess the image
# vis_processors stores image transforms for "train" and "eval" (validation / testing / inference)
raw_image = Image.open("./merlion.png").convert("RGB")

image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
# generate caption
res=model.generate({"image": image})
print(res)
# ['a large fountain spewing water into the air']