Qwen大模型微调

1.Qwen基本问答demo2

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
model_name = "Qwen/Qwen2-7B-Instruct"
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "是否需要预约才能拜访楼上的公司?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("response:", response)

2.微调

下载LLaMA-Factory

python 3.11 CUDA 12.1

cd LLaMA-Factory
pip install -e ".[torch,metrics]"

Use pip install --no-deps -e . to resolve package conflicts.

python src/webui.py

根据自己要微调的模型和数据进行选择

微调数据格式

[
  {
    "instruction": "user instruction (required)",
    "input": "user input (optional)",
    "output": "model response (required)",
    "system": "system prompt (optional)",
    "history": [
      ["user instruction in the first round (optional)", "model response in the first round (optional)"],
      ["user instruction in the second round (optional)", "model response in the second round (optional)"]
    ]
  }
]

数据格式及代码

import pandas as pd
import json

# 读取 Excel 文件
excel_file_path = 'C:\\Users\\Administrator\\Desktop\\知识库V0.1-英文.xlsx'
df = pd.read_excel(excel_file_path)

# 假设 Excel 文件有两列:'Question' 和 'Answer'
# 如果列名不同,请相应地修改
questions = df['Question']
answers = df['Answer']

# 转换为 alpaca 格式
alpaca_data = []
for question, answer in zip(questions, answers):
    alpaca_item = {
        "instruction": question,
        "input": "",
        "output": answer,
        "system": "",
        "history": []
    }
    alpaca_data.append(alpaca_item)

# 将结果写入 JSON 文件
json_file_path = 'C:\\Users\\Administrator\\Desktop\\data.json'
with open(json_file_path, 'w', encoding='utf-8') as f:
    json.dump(alpaca_data, f, ensure_ascii=False, indent=4)

print(f"转换完成,结果已保存到 {json_file_path}")

在data/dataset_info.json添加自己的数据新增自己的数据,在webui界面可选训练数据集

“our_data": {
"file_name": "data.json"
},

也可直接运行以下命令训练

llamafactory-cli train     --stage sft     --do_train True     --model_name_or_path Qwen/Qwen1.5-7B-Chat     --preprocessing_num_workers 16     --finetuning_type lora     --template qwen     --flash_attn auto     --dataset_dir data     --dataset our_data     --cutoff_len 1024     --learning_rate 5e-05     --num_train_epochs 1000.0     --max_samples 100000     --per_device_train_batch_size 2     --gradient_accumulation_steps 8     --lr_scheduler_type cosine     --max_grad_norm 1.0     --logging_steps 5     --save_steps 100     --warmup_steps 0     --optim adamw_torch     --packing False     --report_to none     --output_dir saves/Qwen1.5-7B-Chat/lora/train_2024-08-16-17-21-38     --bf16 True     --plot_loss True     --ddp_timeout 180000000     --include_num_input_tokens_seen True     --lora_rank 8     --lora_alpha 16     --lora_dropout 0     --lora_target all

微调后测试,可使用基座大模型和微调lora模型

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
model_name = "Qwen/Qwen2-7B-Instruct"
#model_name = "/media/dgh/LLaMA-Factory/saves/Qwen2-7B-Chat/lora/train_2024-08-10-14-15-57/checkpoint-4400"
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = PeftModel.from_pretrained(model, model_id = "/media/dgh/LLaMA-Factory/saves/Qwen2-7B-Chat/lora/train_2024-08-10-14-15-57/checkpoint-4400")

prompt = "是否需要预约才能拜访楼上的公司?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("response:", response)

模型合并同时也可运行合并后的模型测试

CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \
    --model_name_or_path  Qwen/Qwen1.5-7B-Chat\
    --adapter_name_or_path /media/dgh/LLaMA-Factory/saves/Qwen1.5-7B-Chat/lora/train_2024-08-16-17-21-38/checkpoint-7300 \
    --template qwen \
    --finetuning_type lora \
    --export_dir /media/dgh/Qwen2-main/save \
    --export_size 2 \
    --export_legacy_format False

编译llama.cpp

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

guuf格式生成

python convert_hf_to_gguf.py /media/dgh/Qwen2-main/save --outfile /media/dgh/Qwen2-main/7b_guuf/qwen2-7b-instruct-fp16.gguf

量化GGUF模型: q2_kq3_k_mq4_0q4_k_mq5_0q5_k_mq6_kq8_0 。 了解更多信息,请访问 llama.cpp

q4_0

./llama-quantize /home/dgh/LLaMA-Factory/saves/Qwen1.5-14b_guuf/Qwen1.5-14B-Chat-F16.gguf /home/dgh/LLaMA-Factory/saves/qwen1.5-14b-q4_0_gguf/qwen1.5-14b-q4_0.gguf q4_0

 q5_k_m

./llama-quantize /home/dgh/LLaMA-Factory/saves/Qwen1.5-14b_guuf/Qwen1.5-14B-Chat-F16.gguf /home/dgh/LLaMA-Factory/saves/qwen1.5-14b-q5_k_m_gguf/qwen1.5-14b-q5_k_m.gguf q5_k_m

run

./llama-cli -m /home/dgh/LLaMA-Factory/saves/qwen1.5-14b-q5_k_m_gguf/qwen1.5-14b-q5_k_m.gguf \
-n 512 -co -i -if -f prompts/chat-with-qwen.txt \
--in-prefix "<|im_start|>user\n" \
--in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
-ngl 80 -fa

阿里百炼在线大模型调用大模型服务平台百炼

在阿里百炼可以零代码创建自己的智能体,其中调用API如下:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen-turbo",
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': '你是谁?'}
    ],
    temperature=0.8
)

print(completion.choices[0].message.content)

创建自己的智能体也可根据给定API调用,具体参考官方使用指导教程

下载最新的LLaMA-Factory 支持微调Qwen-VL

遇到报错:

ValueError: The checkpoint you are trying to load has model type `qwen2_vl` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date

原因:pip中transformers还没有更新,用github上的版本更新
解决方法:pip install git+https://github.com/huggingface/transformers

采用swift框架微调Qwen2-VL

参考Qwen2-VL-阿里云开发者社区

采用的是python 3.8。CUDA 12.1

git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]
pip install pyav qwen_vl_utils

采用官方指定数据集训练

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type qwen2-vl-7b-instruct \
  --model_id_or_path qwen/Qwen2-VL-7B-Instruct \
  --sft_type lora \
  --dataset coco-en-mini#20000 \
  --deepspeed default-zero2

采用自己的数据集训练

  --dataset train.jsonl \
  --val_dataset val.jsonl \

数据集格式

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee<image>eeeee<image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "images": []}

微调后推理,并合并模型

CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/qwen2-vl-7b-instruct/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

合并之后的模型可直接调用

from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
from modelscope import snapshot_download
model_dir = "/Qwen2-VL-2B-Instruct/output/v4-20240923/checkpoint-1000-merged"
# Load the model in half-precision on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(model_dir, device_map="auto", torch_dtype = torch.float16)
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_dir, min_pixels=min_pixels, max_pixels=max_pixels)
messages = [{"role": "user", "content": [{"type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"}, {"type": "text", "text": "Describe this image."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)

多图推理

# Messages containing multiple images and a text query
messages = [{"role": "user", "content": [{"type": "image", "image": "file:///path/to/image1.jpg"}, {"type": "image", "image": "file:///path/to/image2.jpg"}, {"type": "text", "text": "Identify the similarities between these images."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)

视频理解

# Messages containing a video and a text query
messages = [{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4", 'max_pixels': 360*420, 'fps': 1.0}, {"type": "text", "text": "Describe this video."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值