高效微调大型预训练模型的方法 - LoRA 微调 ChatGLM3-6B 配置参数

西笑生

已于 2024-08-17 15:55:34 修改

阅读量544

点赞数 18

分类专栏：大模型文章标签：人工智能 ChatGLM-6B LoRA 微调

于 2024-08-17 15:54:55 首次发布

本文链接：https://blog.csdn.net/flyfish1986/article/details/141280593

版权

大模型专栏收录该内容

80 篇文章 0 订阅

订阅专栏

高效微调大型预训练模型的方法 - LoRA 微调 ChatGLM3-6B 配置参数

flyfish

LoRA (Low-Rank Adaptation) 微调是一种用于高效微调大型预训练模型的方法。LoRA 微调的核心思想是通过引入低秩矩阵来减少模型参数的数量，从而降低微调的计算成本和存储需求。

LoRA 微调的基本概念

低秩分解 ：在神经网络中，权重矩阵通常是高维的。LoRA 通过将这些高维矩阵分解为两个低秩矩阵的乘积，从而减少参数的数量。例如，假设原始权重矩阵 $W$ 的维度是 $\times d$ ，LoRA 将其分解为两个低秩矩阵 $A$ 和 $B$ ，其中 $A$ 的维度为 $\times r$ 而 $B$ 的维度为 $\times d$ （其中 $r$ 通常远小于 $d$ ），因此 $W$ 近似为 $\Delta W$ ，其中 $\Delta W = A \times B$ 。
冻结原始权重 ：在 LoRA 微调中，预训练模型的原始权重保持冻结，即不再更新。只有新引入的低秩矩阵 $A$ 和 $B$ 参与训练。因此，LoRA 不仅减少了参数量，还避免了在微调过程中改变原始模型的性能。
高效性 ：由于 LoRA 微调只涉及低秩矩阵的更新，所需的计算资源和显存占用显著减少，这使得在资源受限的情况下微调大型模型成为可能。
应用场景 ：LoRA 微调主要应用于自然语言处理（NLP）任务中，尤其是在需要对大型语言模型（如 GPT、BERT）进行微调时。通过引入 LoRA，研究人员和开发者可以更高效地适应不同的任务，而不必微调整个模型。

LoRA 微调的配置项

data_config:
  train_file: train.json
  val_file: dev.json
  test_file: dev.json
  num_proc: 16
max_input_length: 256
max_output_length: 512
training_args:
  # see `transformers.Seq2SeqTrainingArguments`
  output_dir: ./output
  max_steps: 3000
  # needed to be fit for the dataset
  learning_rate: 5e-5
  # settings for data loading
  per_device_train_batch_size: 4
  dataloader_num_workers: 16
  remove_unused_columns: false
  # settings for saving checkpoints
  save_strategy: steps
  save_steps: 500
  # settings for logging
  log_level: info
  logging_strategy: steps
  logging_steps: 10
  # settings for evaluation
  per_device_eval_batch_size: 16
  evaluation_strategy: steps
  eval_steps: 500
  # settings for optimizer
  # adam_epsilon: 1e-6
  # uncomment the following line to detect nan or inf values
  # debug: underflow_overflow
  predict_with_generate: true
  # see `transformers.GenerationConfig`
  generation_config:
    max_new_tokens: 512
  # set your absolute deepspeed path here
  #deepspeed: ds_zero_2.json
  # set to true if train with cpu.
  use_cpu: false
peft_config:
  peft_type: LORA
  task_type: CAUSAL_LM
  r: 8
  lora_alpha: 32
  lora_dropout: 0.1