[模型部署]Pix2struct用于widget caption
huggingface开源网址: https://huggingface.co/google/pix2struct-widget-captioning-large
阅读笔记:click here
环境依赖
from pdf2image import convert_from_path, convert_from_bytes
import torch
from functools import partial
from PIL import Image
from transformers import Pix2StructForConditionalGeneration as psg
from transformers import Pix2StructProcessor as psp
请保证:transformers>=4.30
定义图像路径,输出文件路径与模型路径
IMAGE_PATH = "###"
OUTPUT_FILE = "###"
MODEL_PATH = "###"
def main():
try:
# 加载图像
image = Image.open(IMAGE_PATH).CONVERT("rgb")
print("Image loading complete.")
# 加载模型和处理器
model = Pix2StructForConditionalGeneration.from_pretrained(MODEL_PATH)
processor = Pix2StructRrocessor.from_pretrained(MODEL_PATH)
# 处理图像
question = "none"
inputs = processor(images=image, return_tensors="pt")
predictions = model.generate(**inputs)
# 解码预测结果
decoded_output = processor.decode(predictions[0], skip_special_tokens=True)
print(decoded_output)
# 确保创建输出文件,并在写入前清空内容
with open(OUTPUT_FILE, 'w') as f:
f.write(f"Output:\n{decoded_output}\n")
except Exception as e:
# 确保创建输出文件,并在写入前清空内容
with open(OUTPUT_FILE, 'w') as f:
error_message = f"Error: {str(e)}\n"
error_trace = traceback.format_exc()
print(error_message)
print(error_trace)
f.write(error_message)
f.write(error_trace)
if __name__ == "__main__":
main()
执行该代码可以成功执行推理。
一些讨论
注意,如果进行condition generation,除图片输入外,还需要添加一个header。
实现widget caption设置r为 condition generation时,报错:“ValueError: A header text must be provided for VQA models.”
huggingface模型描述中缺少如何添加用于定位小部件的边界框的信息,描述只是说它应该与 pix2struct-textcaps-base 相同,实际上情况并非如此。
TODO:添加bounding box而不是question作为header。
复现时,作者找到了数据集尝试寻找描述bounding box的方法,没有得到有效解决方式。