Qlora微调qwen模型

qq_22544887

已于 2024-08-02 16:35:27 修改

阅读量205

点赞数 2

文章标签： python llama 语言模型

于 2024-08-02 16:34:00 首次发布

本文链接：https://blog.csdn.net/qq_22544887/article/details/140874053

版权

!pip install peft transformers_stream_generator einops tiktoken

Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Requirement already satisfied: peft in /root/miniconda3/lib/python3.10/site-packages (0.12.0)
Requirement already satisfied: transformers_stream_generator in /root/miniconda3/lib/python3.10/site-packages (0.0.5)
Requirement already satisfied: einops in /root/miniconda3/lib/python3.10/site-packages (0.8.0)
Collecting tiktoken
  Downloading http://mirrors.aliyun.com/pypi/packages/e7/8c/7d1007557b343d5cf18349802e94d3a14397121e9105b4661f8cd753f9bf/tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m260.9 kB/s[0m eta [36m0:00:00[0ma [36m0:00:02[0m
[?25hRequirement already satisfied: numpy>=1.17 in /root/miniconda3/lib/python3.10/site-packages (from peft) (1.26.3)
Installing collected packages: tiktoken
Successfully installed tiktoken-0.7.0
[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.[0m[33m
[0m

import os
import json
import torch
from torch.utils.data import Dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import get_peft_model, LoraConfig, TaskType

# 禁用 wandb
os.environ["WANDB_DISABLED"] = "true"

# 指定使用第二张GPU
#os.environ["CUDA_VISIBLE_DEVICES"] = "1"
device_id = 0
#torch.cuda.set_device(device_id)

model_path = "/root/autodl-fs/Qwen-1_8B-Chat/qwen/Qwen-1_8B-Chat"
train_data_file = '/root/autodl-fs/train_data1/训练集示例.json'
test_data_file = '/root/autodl-fs/train_data1/测试集示例.json'

模型训练

class CustomDataset(Dataset):
    def __init__(self, data, tokenizer, max_length):
        self.data = data
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        item = self.data[idx]
        user_input = item['conversations'][0]['value']
        assistant_response = item['conversations'][1]['value']

        inputs = self.tokenizer(
            user_input, truncation=True, padding='max_length', max_length=self.max_length, return_tensors='pt'
        )
        labels = self.tokenizer(
            assistant_response, truncation=True, padding='max_length', max_length=self.max_length, return_tensors='pt'
        )

        input_ids = inputs['input_ids'].squeeze()
        attention_mask = inputs['attention_mask'].squeeze()
        labels = labels['input_ids'].squeeze()

        return {
            'input_ids': input_ids,
            'attention_mask': attention_mask,
            'labels': labels
        }

# 加载本地模型和分词器
model_path = model_path
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True,pad_token='<|endoftext|>', 
    padding_side="right")

The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]


# 数据集加载
train_data = json.load(open(train_data_file, 'r', encoding='utf-8'))

# 创建自定义数据集
train_dataset = CustomDataset(train_data, tokenizer, max_length=512)

# 确保模型使用第二张GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# 配置 QLoRA
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["c_attn", "c_proj", "w1", "w2" ]# ["q_proj", "v_proj"]  # 这里需要根据模型架构指定具体模块
)
model = get_peft_model(model, peft_config)

# 设置训练参数
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy="no",  # 不进行评估
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=10,
    weight_decay=0.01,
)

# 初始化Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
)

# 开始训练
trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
/root/miniconda3/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(




<div>

  <progress value='10' max='10' style='width:300px; height:20px; vertical-align: middle;'></progress>
  [10/10 00:01, Epoch 10/10]
</div>
<table border="1" class="dataframe">

Step Training Loss

TrainOutput(global_step=10, training_loss=8.084375, metrics={'train_runtime': 1.8396, 'train_samples_per_second': 10.872, 'train_steps_per_second': 5.436, 'total_flos': 94148996628480.0, 'train_loss': 8.084375, 'epoch': 10.0})

评估

# 加载测试集
test_data = json.load(open(test_data_file, 'r', encoding='utf-8'))

# 确保模型使用指定的GPU
device = torch.device(f"cuda:{device_id}" if torch.cuda.is_available() else "cpu")
model.to(device)

# 推理函数
def generate_response(model, tokenizer, prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(inputs["input_ids"], max_length=512, num_return_sequences=1)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# 进行推理并保存结果
for item in test_data:
    user_input = item['conversations'][0]['value']
    response = generate_response(model, tokenizer, user_input)
    item['conversations'][1]['value'] = response

Both `max_new_tokens` (=512) and `max_length`(=512) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=512) and `max_length`(=512) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)

写验证文件

with open('train_data1/output.json', 'w', encoding='utf-8') as f:
    json.dump(test_data, f, ensure_ascii=False, indent=4)

qq_22544887

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Qlora微调qwen模型

【代码】p-tuning v2 微调 chatGLM实战。
复制链接

扫一扫