微调实战 - 使用 Unsloth 微调 QwQ 32B 4bit (单卡4090)

本文参考视频教程:赋范课堂 – 只需20G显存,QwQ-32B高效微调实战!4大微调工具精讲!知识灌注+问答风格微调,DeepSeek R1类推理模型微调+Cot数据集创建实战打造定制大模型!
https://www.bilibili.com/video/BV1YoQoYQEwF/
课件资料:https://kq4b3vgg5b.feishu.cn/wiki/LxI9wmuFmiaLCkkoiCIcKvOan7Q
在此之上有删改

赋范课堂 有非常好的课程,推荐大家去学习观看



一、基本准备

1、安装unsloth

pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

2、wandb 安装与注册

wandb 类似于 tensorboard,但比它稳定

注册:https://wandb.ai/site
API Key : https://wandb.ai/ezcode/t0322?product=models

注册和使用,详见:https://blog.csdn.net/lovechris00/article/details/146437418


安装 库

pip install wandb

登录,输入 API key

wandb login

3、下载模型

https://huggingface.co/unsloth/QwQ-32B-unsloth-bnb-4bit


安装 huggingface_hub
pip install huggingface_hub

使用screen开启持久化会话

模型下载时间可能持续0.5-1个小时。避免因为关闭会话导致下载中断


安装 screen

sudo apt install screen

screen -S qwq

设置模型国内访问镜像

Linux 上 ~/.bashrc 添加环境变量

export HF_ENDPOINT='https://hf-mirror.com' 

下载模型
huggingface-cli download --resume-download  unsloth/QwQ-32B-unsloth-bnb-4bit

修改模型默认下载地址

模型默认下载到 ~/.cache/huggingface/hub/,如果想改到其它地方,可以设置 HF_HOME

export HF_HOME="/root/xx/HF_download"

二、模型调用测试

modelscope 调用

from modelscope import AutoModelForCausalLM, AutoTokenizer

model_name = "unsloth/QwQ-32B-unsloth-bnb-4bit"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)


prompt = "你好,好久不见!"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Ollama 调用

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

prompt = "你好,好久不见!"
messages = [
    {"role": "user", "content": prompt}
]

response = client.chat.completions.create(
    messages=messages,
    model='qwq-32b-bnb',
)

print(response.choices[0].message.content)


模型注册



查看是否注册成功

ollama list 

使用 openai 库请求

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

prompt = "你好,好久不见!"
messages = [
    {"role": "user", "content": prompt}
]


vLLM 调用

vllm serve /root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit \
--quantization bitsandbytes \
--load-format bitsandbytes \
--max-model-len 2048

请求测试
from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

prompt = "你好,好久不见!"
messages = [
    {"role": "user", "content": prompt}
]

response = client.chat.completions.create(
    model="/root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit",
    messages=messages,
)

print(response.choices[0].message.content)

三、下载微调数据集

推理类模型 回复结构 与 微调数据集结构 要求

QwQ-32B模型和DeepSeek R1类似,推理过程的具体体现就是 在回复内容中,会同时包含推理部分内容 和 最终回复部分内容,并且其推理部分内容会通过(一种在模型训练过程中注入的特殊标记)来进行区分。


下载 NuminaMath CoT 数据集

https://huggingface.co/datasets/AI-MO/NuminaMath-CoT

huggingface-cli download AI-MO/NuminaMath-CoT --repo-type dataset

除了NuminaMath CoT数据集外,还有APPs(编程数据集)、TACO(编程数据集)、long_form_thought_data_5k(通用问答数据集)等,都是CoT数据集,均可用于推理模型微调。相关数据集介绍,详见公开课:《借助DeepSeek R1进行模型蒸馏,模型蒸馏入门实战!》| https://www.bilibili.com/video/BV1X1FoeBEgW/



下载 medical-o1-reasoning-SFT数据集

https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT

huggingface-cli download FreedomIntelligence/medical-o1-reasoning-SFT --repo-type dataset

你也可以 使用 Python - datasets 库来下载

from datasets import load_dataset

# 此处先下载前500条数据即可完成实验
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)

# 查看数据集情况
dataset[0]

四、加载模型

from unsloth import FastLanguageModel 

max_seq_length = 2048 
dtype = None 
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/QwQ-32B-unsloth-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)


此时消耗 GPU : 22016MB


五、微调前测试

查看模型信息

>>> model
Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(152064, 5120, padding_idx=151654)
    (layers): ModuleList(
      (0): Qwen2DecoderLayer(
        ...
      (62): Qwen2DecoderLayer(
        ...
      )
      (63): Qwen2DecoderLayer(
        ...
    )
    (norm): Qwen2RMSNorm((5120,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=5120, out_features=152064, bias=False)
)

tokenizer 信息

>>> tokenizer
Qwen2TokenizerFast(name_or_path='unsloth/QwQ-32B-unsloth-bnb-4bit', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|vision_pad|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
  ...
	151667: AddedToken("<think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	151668: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
}
)

基本问答测试

# 将模型调整为推理模式
FastLanguageModel.for_inference(model)  

# 带入问答模板进行回答 

prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。
***
### Instruction:
你是一名助人为乐的助手。
***
### Question:
{}
***
### Response:
<think>{}"""

question = "你好,好久不见!"
prompt = [prompt_style_chat.format(question, "")] 

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    max_new_tokens=2048,
    use_cache=True,
)

# GPU 消耗到 22412 mb 
'''
>>> outputs
tensor([[ 14880, 112672,  46944, 112449, 111423,  36407,  60548,  67949, 105051,
          ...
          35946, 106128,  99245, 101037,  11319, 144236, 151645]],
       device='cuda:0')
'''

response = tokenizer.batch_decode(outputs)
# response --> ['请写出一个恰当的回答来完成当前对话任务。\n***\n### Instruction:\n你是一名助人为乐的助手。\n***\n### Question:\n你好,好久不见!\n***\n### Response:\n<think>:\n好的,用户发来问候“你好,好久不见!”,我需要回应并延续对话。首先,应该友好回应他们的问候,比如“你好!确实很久没联系了,希望你一切都好!”这样既回应了对方,也表达了关心。接下来,可能需要询问对方近况,或者引导对话继续下去。比如可以问:“最近有什么新鲜事吗?或者你有什么需要帮助的吗?”这样可以让对话更自然,也符合助人为乐的角色设定。还要注意语气要亲切,保持口语化,避免过于正式。另外,用户可能希望得到情感上的回应,所以需要体现出关心和愿意帮助的态度。检查有没有语法错误,确保句子流畅。最后,确定回应简洁但足够友好,符合对话的流程。\n</think>\n\n你好!确实好久不见了,希望你一切都好!最近有什么新鲜事分享,或者需要我帮忙什么吗?😊<|im_end|>']

print(response[0].split("### Response:")[1])


复杂问题测试

question = "请证明根号2是无理数。"

inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    max_new_tokens=1200,
    use_cache=True,
)

# GPU 用到 22552MiB

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


原始模型的医疗问题问答

# 重新设置问答模板 
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>{}"""

question_1 = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"

question_2 = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"


inputs1 = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")

outputs1 = model.generate(
    input_ids=inputs1.input_ids,
    max_new_tokens=1200,
    use_cache=True,
)

response1 = tokenizer.batch_decode(outputs1)

print(response1[0].split("### Response:")[1])

 

inputs2 = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")

outputs2 = model.generate(
    input_ids=inputs2.input_ids,
    max_new_tokens=1200,
    use_cache=True,
)
# GPU 22842 MiB 

response2 = tokenizer.batch_decode(outputs2)

print(response2[0].split("### Response:")[1])

六、最小可行性实验

接下来我们尝试进行模型微调

对于当前数据集而言,我们可以带入 原始数据集 的部分数据 进行微调,也可以带入 全部数据 并遍历多次进行微调。

对于大多数的微调实验,我们都可以从 最小可行性实验 入手进行微调,也就是先尝试带入少量数据进行微调,并观测微调效果。

若微调可以顺利执行,并能够获得微调效果,再考虑带入更多的数据进行更大规模微调。


定义提示词

import os
from datasets import load_dataset

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""

EOS_TOKEN = tokenizer.eos_token  # '<|im_end|>'  



定义数据集处理函数

用于对medical-o1-reasoning-SFT数据集进行修改,Complex_CoT 列 和 Response 列 进行拼接,并加上文本结束标记:

def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

整理数据

dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)  
''' 
{
	'Question': 'A 61-year-old ... contractions?',
	'Complex_CoT': "Okay, let's ... incontinence.",
	'Response': 'Cystometry in ... the test.' 
}
'''

# 结构化处理 
dataset = dataset.map(formatting_prompts_func, batched = True,) 

# 查看  
dataset["text"][0]
'''
Below is an instruction that ... response.
***
### Instruction:
You are a medical ... medical question. 
***
### Question:
A 61-year-old woman ... contractions?
***
### Response:
<think>
Okay,...Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.
</think>
Cystometry ... is primarily related to physical e
'''

开启微调

model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# 创建有监督微调对象
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length, 
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)


微调说明

这段代码主要是用 SFTTrainer 进行 监督微调(Supervised Fine-Tuning, SFT),适用于 transformersUnsloth 生态中的模型微调:

相关库
  • SFTTrainer(来自 trl 库):
    • trl(Transformer Reinforcement Learning)是 Hugging Face 旗下的 trl 库,提供 监督微调(SFT)强化学习(RLHF) 相关的功能。
    • SFTTrainer 主要用于 有监督微调(Supervised Fine-Tuning),适用于 LoRA 等低秩适配微调方式。
  • TrainingArguments(来自 transformers 库):
    • 这个类用于定义 训练超参数,比如批量大小、学习率、优化器、训练步数等。
  • is_bfloat16_supported()(来自 unsloth):
    • 这个函数检查 当前 GPU 是否支持 bfloat16(BF16),如果支持,则返回 True,否则返回 False
    • bfloat16 是一种更高效的数值格式,在 新款 NVIDIA A100/H100 等 GPU 上表现更优。

模型微调 参数解析

SFTTrainer 部分
参数作用
model=model指定需要进行微调的 预训练模型
tokenizer=tokenizer指定 分词器,用于处理文本数据
train_dataset=dataset传入 训练数据集
dataset_text_field="text"指定数据集中哪一列包含 训练文本(在 formatting_prompts_func 里处理)
max_seq_length=max_seq_length最大序列长度,控制输入文本的最大 Token 数量
dataset_num_proc=2数据加载的并行进程数,提高数据预处理效率

TrainingArguments 部分
参数作用
per_device_train_batch_size=2每个 GPU/设备 的训练批量大小(较小值适合大模型)
gradient_accumulation_steps=4梯度累积步数(相当于 batch_size=2 × 4 = 8
warmup_steps=5预热步数(初始阶段学习率较低,然后逐步升高)
max_steps=60最大训练步数(控制训练的总步数,此处总共约消耗60*8=480条数据)
learning_rate=2e-4学习率2e-4 = 0.0002,控制权重更新幅度)
fp16=not is_bfloat16_supported()如果 GPU 不支持 bfloat16,则使用 fp16(16位浮点数)
bf16=is_bfloat16_supported()如果 GPU 支持 bfloat16,则启用 bfloat16(训练更稳定)
logging_steps=10每 10 步记录一次训练日志
optim="adamw_8bit"使用 adamw_8bit(8-bit AdamW优化器)减少显存占用
weight_decay=0.01权重衰减(L2 正则化),防止过拟合
lr_scheduler_type="linear"学习率调度策略(线性衰减)
seed=3407随机种子(保证实验结果可复现)
output_dir="outputs"训练结果的输出目录

设置 wandb、开始微调

import wandb
wandb.login(key="8c7...242bd")
run = wandb.init(project='Fine-tune-QwQ-32B-4bit on Medical COT Dataset', )

# 开始微调
trainer_stats = trainer.train()


如果 出现 CUDA out of memory 的情况,可以酌情修改参数。

试试如下代码(仅用于测试,不保证效果):

import torch
torch.cuda.empty_cache()

import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" 

from unsloth import FastLanguageModel 

max_seq_length = 1024
dtype = None 
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/QwQ-32B-unsloth-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

import os
from datasets import load_dataset

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""

EOS_TOKEN = tokenizer.eos_token  # '<|im_end|>'  

def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:200]",trust_remote_code=True)  


# 结构化处理 
dataset = dataset.map(formatting_prompts_func, batched = True,) 

# 开启微调 
model = FastLanguageModel.get_peft_model(
    model,
    r=8,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=8,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# 创建有监督微调对象
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length, 
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=8,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=20,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

import wandb
wandb.login(key="8c7b98e4f525793b228b04fcc3596acd9e7242bd")
run = wandb.init(project='Fine-tune-QwQ-32B-4bit on Medical COT Dataset', )

# 开始微调
trainer_stats = trainer.train()



查看效果

unsloth在微调结束后,会自动更新模型权重(在缓存中),因此无需手动合并模型权重 即可直接调用微调后的模型:

trainer_stats
# TrainOutput(global_step=60, training_loss=1.3152311007181803, metrics={'train_runtime': 709.9004, 'train_samples_per_second': 0.676, 'train_steps_per_second': 0.085, 'total_flos': 6.676294205826048e+16, 'train_loss': 1.3152311007181803})

# 到推理状态 
FastLanguageModel.for_inference(model)

# 再次查看问答效果 
inputs = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=2048,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

inputs = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=2048,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


模型合并

save_path = 'QwQ-Medical-COT-Tiny'
model.save_pretrained_merged(save_path, tokenizer, save_method = "merged_4bit",) 

保存为 GGUF

方便使用ollama进行推理

导出与合并需要较长时间(约20分钟左右)

save_path = 'QwQ-Medical-COT-Tiny-GGUF'
model.save_pretrained_gguf(save_path, tokenizer, quantization_method = "q4_k_m") 

七、完整高效微调实验

最后,带入全部数据进行高效微调,以提升模型微调效果。


# 设置训练的提示词模板 
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""


EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }


# 读取全部数据 
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

# 加载模型 
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)


from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# 设置epoch为3,遍历3次数据集:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs = 3,
        warmup_steps=5,
        # max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

# Map (num_proc=2):   0%| | 0/25371 [00:00<?, ? examples/s] 

trainer_stats = trainer.train()

[ 389/9513 13:44 < 5:24:01, 0.47 it/s, Epoch 0.12/3]

StepTraining Loss
101.285900
201.262500
3701.201200
3801.215600

这里总共训练约15个小时。


trainer_stats

TrainOutput(global_step=9513, training_loss=1.0824475168592858, metrics={'train_runtime': 20193.217, 'train_samples_per_second': 3.769, 'train_steps_per_second': 0.471, 'total_flos': 2.7936033274397737e+18, 'train_loss': 1.0824475168592858, 'epoch': 2.9992117294655527})

测试

带入两个问题进行测试,均有较好的回答效果:


question = "A 61-year-old ... contractions?"

FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


question = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"

FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


2025-03-22(六)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

编程乐园

请我喝杯伯爵奶茶~!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值