unsloth微调QwQ32B(4bit)

unsloth微调QwQ32B(4bit)

GPU: 3090 24G

unsloth安装部署

  • pip 安装

    pip install unsloth --index https://pypi.mirrors.usrc.edu.cn/simple
    
    source /etc/network_turbo
    
    pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
    

    image-20250318225453359


注册Wandb以监控模型微调过程

  • wandb地址

    https://wandb.ai/site

    image-20250322235532991

  • 登录

    下载

    pip install wandb
    

    使用api-key登录

    wandb login
    

  • 使用官网示例看一看

    备注:

    1. 需要联网
    2. 需要将key改为自己的
    3. entity需要提前设立
    import random
    import wandb
    
    wandb.login(key="api-key")
    
    # Start a new wandb run to track this script.
    run = wandb.init(
        # Set the wandb entity where your project will be logged (generally your team name).
        entity="qinchihongye-pa",
        # Set the wandb project where this run will be logged.
        project="project_test",
        # Track hyperparameters and run metadata.
        config={
            "learning_rate": 0.02,
            "architecture": "CNN",
            "dataset": "CIFAR-100",
            "epochs": 10,
        },
    )
    
    # Simulate training.
    epochs = 10
    offset = random.random() / 5
    for epoch in range(2, epochs):
        acc = 1 - 2**-epoch - random.random() / epoch - offset
        loss = 2**-epoch + random.random() / epoch + offset
        # Log metrics to wandb.
        run.log({"acc": acc, "loss": loss})
    
    # Finish the run and upload any remaining data.
    run.finish()
    

    image-20250323001331600

    image-20250323001400324


下载QwQ32B量化模型

  • huggingface地址(unsloth量化的4bit,比Q4_K_M量化的损失精度更小)

    https://huggingface.co/unsloth/QwQ-32B-unsloth-bnb-4bit

    复制名称

    unsloth/QwQ-32B-unsloth-bnb-4bit

  • 假设当前目录为

    /root/lanyun-tmp

  • 创建文件夹统一存放Huggingface下载的模型

    mkdir Hugging-Face 
    mkdir -p Hugging-Face/QwQ-32B-unsloth-bnb-4bit
    
  • 配置镜像源

    vim ~/.bashrc
    

    填入以下两个,以修改HuggingFace 的镜像源 、模型保存的默认

    export HF_ENDPOINT=https://hf-mirror.com
    export HF_HOME=/root/lanyun-tmp/Hugging-Face

    重新加载,查看环境变量是否生效

    source ~/.bashrc
    
    echo $HF_ENDPOINT
    echo $HF_HOME
    
  • 安装 HuggingFace 官方下载工具

    pip install -U huggingface_hub
    
  • 执行下载模型的命令

    huggingface-cli download --resume-download unsloth/QwQ-32B-unsloth-bnb-4bit --local-dir  /root/lanyun-tmp/Hugging-Face/QwQ-32B-unsloth-bnb-4bit
    
    Hugging-Face/QwQ-32B-unsloth-bnb-4bit
    

    或者使用python下载

    from huggingface_hub import snapshot_download
    snapshot_download(
        repo_id = "unsloth/QwQ-32B-unsloth-bnb-4bit",
        local_dir = "/root/lanyun-tmp/Hugging-Face/QwQ-32B-unsloth-bnb-4bit",
    )
    

transformers库调用示例

  • 代码

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "/root/lanyun-tmp/Hugging-Face/QwQ-32B-unsloth-bnb-4bit"
    
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype="auto",
        device_map="cuda:0",
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    
    prompt = "你好"
    messages = [
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    
    model_inputs = tokenizer([text]
                             , return_tensors="pt"
                            ).to(model.device)
    
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=32768
    )
    
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]
    
    response = tokenizer.batch_decode(generated_ids
                                      , skip_special_tokens=True
                                     )[0]
    print(response)
    

    image-20250319224154469

  • 显存占用:23G左右。

    image-20250319224423863


vllm启动示例

  • 启动

    cd /root/lanyun-tmp/Hugging-Face
    
    
    vllm serve ./QwQ-32B-unsloth-bnb-4bit \
    --quantization bitsandbytes \
    --load-format bitsandbytes \
    --max-model-len 500 \
    --port 8081
    
  • 调用代码

    from openai import OpenAI
    import openai
    
    openai.api_key = '1111111' # 这里随便填一个
    openai.base_url = 'http://127.0.0.1:8081/v1'
    
    
    def get_completion(prompt, model="QwQ-32B"):
        client = OpenAI(api_key=openai.api_key,
                        base_url=openai.base_url
                        )
        messages = [{"role": "user", "content": prompt}]
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            stream=False
        )
        return response.choices[0].message.content
    	
    prompt = '你好,请幽默的介绍下你自己,不少于300字'
    get_completion(prompt, model="./QwQ-32B-unsloth-bnb-4bit")
    

cot数据集

  • FreedomIntelligence/medical-o1-reasoning-SFT

    https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT

  • 英文数据集下载

    from datasets import load_dataset
    import rich
    
    # Login using e.g. `huggingface-cli login` to access this dataset
    ds = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "en")
    
    rich.print(ds['train'][0])
    

    image-20250322102329936

  • 中文数据集下载

    from datasets import load_dataset
    import rich
    
    # Login using e.g. `huggingface-cli login` to access this dataset
    ds = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "zh")
    
    rich.print(ds['train'][0])
    

    image-20250322102403774

  • 下载完成后会看到在HuggingFace目录下的datasets目录中有刚刚下载的数据

    ll /root/lanyun-tmp/Hugging-Face/datasets/
    

    image-20250322102756924


unsloth加载QwQ32b模型

  • unsloth支持直接加载模型并推理,先加载模型

    from unsloth import FastLanguageModel
    
    max_seq_length = 2048
    dtype = None
    load_in_4bit = True # 4bit
    
    
    model,tokenizer = FastLanguageModel.from_pretrained(
         model_name = "/root/lanyun-tmp/Hugging-Face/QwQ-32B-unsloth-bnb-4bit/",
    	 max_seq_length = max_seq_length,
    	 dtype = dtype,
    	 load_in_4bit = load_in_4bit,
    )
    

    image-20250323002404635

    显存占用22G左右

    image-20250323002435203

  • 推理

    # 将模型调整为推理模式
    FastLanguageModel.for_inference(model)
    
    def QwQ32b_infer(question):
    	# prompt模板
    	prompt_style_chat = """请写出一个恰当的回来来完成当前对话任务。
    	### Instruction:
    	你是一名助人为乐的助手。
    	### Question:
    	{}
    	### Response:
    	<think>{}"""
    	# [prompt_style_chat.format(question,"")]
    	inputs = tokenizer([prompt_style_chat.format(question, "")]
                       ,return_tensors="pt"
    				   ).to("cuda")
        
    	outputs = model.generate(
    						input_ids = inputs.input_ids,
    						max_new_tokens=2048,
    						use_cache=True,
    					)
    	response = tokenizer.batch_decode(outputs)
    	return response[0].split("### Response:")[1]
    
    question = "证明根号2是无理数"
    response = QwQ32b_infer(question)
    

    image-20250323003010238


模型微调

  • 测试:使用微调数据集进行测试

    question_1 = "根据描述,一个1岁的孩子在夏季头皮出现多处小结节,长期不愈合,且现在疮大如梅,溃破流脓,口不收敛,头皮下有空洞,患处皮肤增厚。这种病症在中医中诊断为什么病?"
    
    question_2 = "一个生后8天的男婴因皮肤黄染伴发热和拒乳入院。体检发现其皮肤明显黄染,肝脾肿大和脐部少量渗液伴脐周红肿。在此情况下,哪种检查方法最有助于确诊感染病因?"
    
    
    response_1 = QwQ32b_infer(question_1)
    response_2 = QwQ32b_infer(question_2)
    
    print(response_1)
    print(response_2)
    

    image-20250323004511358

    image-20250323005528685

  • 加载并处理数据,选择训练集前500条进行最小可行性实验

    import os
    from datasets import load_dataset
    
    # 问答提示词模板
    train_prompt_style = """下面是描述任务的指令,与提供进一步上下文的输入配对。编写适当完成请求的响应。在回答之前,仔细思考问题,并创建逐步的思想链,以确保逻辑和准确的响应。
    
    ### Instruction:
    您是一位在临床推理、诊断和治疗计划方面拥有先进知识的医学专家。请回答以下医学问题。 
    
    ### Question:
    {}
    
    ### Response:
    <think>
    {}
    </think>
    {}"""
    
    # 文本生成结束的基本标记
    EOS_TOKEN = tokenizer.eos_token
    tokenizer.eos_token # '<|im_end|>'
    
    # 定义函数,对数据集进行修改
    def formatting_prompts_func(examples):
        inputs = examples["Question"]
        cots = examples["Complex_CoT"]
        outputs = examples["Response"]
        texts = []
        for input, cot, output in zip(inputs, cots, outputs):
            text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
            texts.append(text)
        return {
            "text": texts,
        }
        
    # 先选择训练集前500条数据
    dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT"
                           ,"zh"
                           , split = "train[0:500]"
                           ,trust_remote_code=True
                          )
    dataset = dataset.map(formatting_prompts_func
                          , batched = True
                         )
    
    import rich
    rich.print(dataset[0])
    rich.print(dataset[0]['text'])
    

    image-20250323010653554

  • 将模型设置为微调模式

                           
    # 将模型设置为微调模式
    model = FastLanguageModel.get_peft_model(
        model,
        r=4, # r=16 # 低秩矩阵的秩
        target_modules=[
            "q_proj",
            "k_proj",
            "v_proj",
            "o_proj",
            "gate_proj",
            "up_proj",
            "down_proj",
        ],
        lora_alpha=16,
        lora_dropout=0,  
        bias="none",  
        use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
        random_state=1024,
        use_rslora=False,  
        loftq_config=None,
    )
    

    image-20250323012425940

  • 创建训练器(有监督微调对象)

    from trl import SFTTrainer
    from transformers import TrainingArguments
    from unsloth import is_bfloat16_supported
    
    trainer = SFTTrainer(
        model=model, # 指定需要微调的预训练模型
        tokenizer=tokenizer, # 分词器
        train_dataset=dataset, # 训练数据
        dataset_text_field="text", # 指定数据集中那一列包含训练文本(在formatting_prompt_func里面指定)
        max_seq_length=max_seq_length, #最大序列长度,用于控制输入文本的最大token数量
        dataset_num_proc=2, # 数据加载的并行进程数
        args=TrainingArguments(
            per_device_train_batch_size=1, # 每个GPU/设备的戌年批量大小(较小值适合大模型)
            gradient_accumulation_steps=4, # 梯度累计步数,相当于batch_size=1*4=4
            # num_train_epochs = 1, # 如果设置了num_train_epochs,则max_steps失效
            warmup_steps=5, # 预热步数,初始阶段学习率较低,然后逐步升高
            max_steps=60,# 最大训练步数
            learning_rate=2e-4, # 学习率
            fp16=not is_bfloat16_supported(),  # 如果GPU不支持bfloat16,则使用fp16(16位浮点数)
            bf16=is_bfloat16_supported(), # 如果GPU支持bfloat16,则启用bf16(训练更稳定)
            logging_steps=10, # 每10步记录一次日志
            optim="adamw_8bit", # 使用adamw_8bit 8bit adamw优化器减少显存占用
            weight_decay=0.01, # 权重衰减 L2正则化,防止过拟合
            lr_scheduler_type="linear", # 学习率调整策略,线性衰减
            seed=1024, # 随机种子,保证实验结果可复现
            output_dir="/root/lanyun-tmp/outputs", # 训练结果的输出目录
        ),
    )
    
    # 设置wandb(可选则)
    import wandb
    wandb.login(key="api-key")
    
    run = wandb.init(entity="qinchihongye-pa"
                     ,project='QwQ-32B-4bit-FT'
                    )
    
    # 开始模型微调
    trainer_stats = trainer.train()
    
    trainer_status
    

    image-20250323155933809

    训练过程中的显存占用如上,训练过程如下

    image-20250323160147618

    点击wandb链接,查看训练过程中的损失函数,学习率,梯度等等的变化。

    image-20250323160324517

  • unsloth在微调结束后,会自动更新模型权重(在缓存中),因此无序手动合并集合直接调用微调后的模型

    FastLanguageModel.for_inference(model)
    
    new_response_1 = QwQ32b_infer(question_1)
    new_response_2 = QwQ32b_infer(question_2)
    
    new_response_1
    new_response_2
    

    image-20250323205055248

    image-20250323205114604

    可以看到第一个问题还是回答错了,第二个问题也如旧,可以考虑继续进行大规模微调,使用全部微调文件+多个epoch。

  • 模型合并

    此时本地保存的模型权重在/root/lanyun-tmp/outputs

    image-20250323205516739

    注意,unsloth中默认100步保存一个checkpoint,因为当前steps=60,所以只有一个checkpoint点。

    合并保存为safetensors

    model.save_pretrained_merged("/root/lanyun-tmp/QwQ-Medical-COT-Tiny"
                                 , tokenizer
                                 , save_method = "merged_4bit_forced",#保存为4bit量化
                                )
    
    # model.save_pretrained_merged("dir"
    #                              , tokenizer
    #                              , save_method = "merged_16bit",#保存为16bit
    #                             )
    

    合并为GGUF格式(需要量化,非常耗时)

    # model.save_pretrained_gguf("dir"
    #                            , tokenizer
    #                            , quantization_method = "q4_k_m"
    #                           )
    
    # model.save_pretrained_gguf("dir"
    #                            , tokenizer
    #                            , quantization_method = "q8_0"
    #                           )
    
    # model.save_pretrained_gguf("dir"
    #                            , tokenizer
    #                            , quantization_method = "f16"
    #                           )
    

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值