Qwen2-vl:No chat template is set for this processor. Please either set the `chat_template` attribute

peft库微调大模型后合并推理问题排查

先将lora合并到base模型中

from transformers import AutoTokenizer, Qwen2VLForConditionalGeneration
from peft import PeftModel
import torch

base_model_path = "E:\ModelScopeModels\hub\Qwen\Qwen2-VL-2B-Instruct"
lora_model_path = "E:\LLm_finetuning\Qwen2-VL-2B-LatexOCR\checkpoint-124"

merged_model_path = "E:\ModelScopeModels\hub\Qwen\Qwen2-VL-2B-Instruct-LaTexOCR-merge"
#加载模型和分词器
model = Qwen2VLForConditionalGeneration.from_pretrained(base_model_path,torch_dtype=torch.float16,trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(base_model_path)
print(model)

#加载lora适配器
model = PeftModel.from_pretrained(model,lora_model_path)
#合并权重
merged_model = model.merge_and_unload()

#保存合并后的模型
merged_model.save_pretrained(merged_model_path)
tokenizer.save_pretrained(merged_model_path)
print(f"merged model saved to {merged_model_path}")

inference

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
from peft import PeftModel, LoraConfig, TaskType
prompt = "你是一个LaText OCR助手,目标是读取用户输入的照片,转换成LaTex公式。"
merged_model_path = r"xxxxxx\Qwen\Qwen2-VL-2B-Instruct-merge"
test_image_path = "997.jpg"

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=True,
    r=64,  # Lora 秩
    lora_alpha=16,  # Lora alaph,具体作用参见 Lora 原理
    lora_dropout=0.05,  # Dropout 比例
    bias="none",
)

# default: Load the model on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(
    merged_model_path, torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(merged_model_path)
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": test_image_path,
                "resized_height": 100,
                "resized_width": 500,
            },
            {"type": "text", "text": f"{prompt}"},
        ],
    }
]
# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=8192)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])

合并后,用合并后的模型进行推理,可能会出现以下错误:No chat template is set for this processor.

在这里插入图片描述

原因是合并后的模型文件夹内缺失chat_template.json文件,需要从原基础模型中复制一份到合并后的文件夹内。

### Qwen2-7B-Instruct Model Information and Usage #### Overview of the Qwen2-VL-7B-Instruct Model The Qwen2-VL-7B-Instruct model is a large-scale, multi-modal language model designed to handle various natural language processing tasks with enhanced capabilities in understanding visual content. This model has been pre-trained on extensive datasets that include both textual and image data, making it suitable for applications requiring cross-modal reasoning. #### Installation and Setup To use this specific version of the Qwen2 series, one needs first to ensure proper installation by cloning or downloading the necessary files from an accessible repository. Given potential issues accessing certain websites due to geographical restrictions, users should consider using alternative mirrors such as `https://hf-mirror.com` instead of attempting direct access through sites like Hugging Face[^3]. For setting up locally: 1. Install required tools including `huggingface_hub`. 2. Set environment variables appropriately. 3. Execute commands similar to: ```bash huggingface-cli download Qwen/Qwen2-VL-7B-Instruct --local-dir ./Qwen_VL_7B_Instruct ``` This command will fetch all relevant components needed for running inference against the specified variant of the Qwen family models. #### Fine-Tuning Process Fine-tuning allows adapting pretrained weights into more specialized domains without starting training anew. For instance, when working specifically within the context provided earlier regarding Qwen2-VL, adjustments can be made via LoRA (Low-Rank Adaptation), which modifies only parts of existing parameters while keeping others fixed during optimization processes[^1]. #### Running Inference Locally Once everything is set up correctly, performing offline predictions becomes straightforward once dependencies are resolved. An example workflow might involve loading saved checkpoints followed by passing input prompts through them until outputs meet desired criteria[^2]: ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("./Qwen_VL_7B_Instruct") model = AutoModelForCausalLM.from_pretrained("./Qwen_VL_7B_Instruct") input_text = "Your prompt here" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --related questions-- 1. What preprocessing steps must be taken before feeding images alongside text inputs? 2. How does performance compare between different quantization levels offered by GPTQ? 3. Are there any particular hardware requirements recommended for efficient deployment? 4. Can you provide examples where fine-tuned versions outperform general-purpose ones significantly?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值