大模型微调

最新推荐文章于 2024-10-11 17:39:47 发布

英雄史诗

最新推荐文章于 2024-10-11 17:39:47 发布

阅读量1k

点赞数 22

分类专栏： AI 文章标签： javascript 开发语言

本文链接：https://blog.csdn.net/heroicpoem/article/details/141874502

版权

AI 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

是

调整已训练好的LLM，适应特定领域的知识和任务。
与RAG对照，是LLM在私域使用的一个重要技术路线。
在这里插入图片描述

why

how

a. Data preparation

Data preparation involves curating and preprocessing the dataset to ensure its relevance and quality for the specific task. This may include tasks such as cleaning the data, handling missing values, and formatting the text to align with the model’s input requirements.

Additionally, data augmentation techniques can be employed to expand the training dataset and improve the model’s robustness. Proper data preparation is essential for fine-tuning as it directly impacts the model’s ability to learn and generalize effectively, ultimately leading to improved performance and accuracy in generating task-specific outputs.

b. Choosing the right pre-trained model

It’s crucial to select a pre-trained model that aligns with the specific requirements of the target task or domain. Understanding the architecture, input/output specifications, and layers of the pre-trained model is essential for seamless integration into the fine-tuning workflow.

Factors such as the model size, training data, and performance on relevant tasks should be considered when making this choice. By selecting a pre-trained model that closely matches the characteristics of the target task, you can streamline the fine-tuning process and maximize the model’s adaptability and effectiveness for the intended application.

c. Identifying the right parameters for fine-tuning

Configuring the fine-tuning parameters is crucial for achieving optimal performance in the fine-tuning process. Parameters such as the learning rate, number of training epochs, and batch size play a significant role in determining how the model adapts to the new task-specific data. Additionally, selectively freezing certain layers (typically the earlier ones) while training the final layers is a common practice to prevent overfitting.

By freezing early layers, the model retains the general knowledge gained during pre-training while allowing the final layers to adapt specifically to the new task. This approach helps maintain the model’s ability to generalize while ensuring that it learns task-specific features effectively, striking a balance between leveraging pre-existing knowledge and adapting to the new task.

d. Validation

Validation involves evaluating a fine-tuned model’s performance using a validation set. Monitoring metrics such as accuracy, loss, precision, and recall provide insights into the model’s effectiveness and generalization capabilities.

By assessing these metrics, you can gauge how well the fine-tuned model is performing on the task-specific data and identify potential areas for improvement. This validation process allows for the refinement of fine-tuning parameters and model architecture, ultimately leading to an optimized model that excels in generating accurate outputs for the intended application.

e. Model iteration

Model iteration allows you to refine the model based on evaluation results. Upon assessing the model’s performance, adjustments to fine-tuning parameters, such as learning rate, batch size, or the extent of layer freezing, can be made to enhance the model’s effectiveness.

Additionally, exploring different strategies, such as employing regularization techniques or adjusting the model architecture, enables you to improve the model’s performance iteratively. This empowers engineers to fine-tune the model in a targeted manner, gradually refining its capabilities until the desired level of performance is achieved.

f. Model deployment

Model deployment marks the transition from development to practical application, and it involves the integration of the fine-tuned model into the specific environment. This process encompasses considerations such as the hardware and software requirements of the deployment environment and model integration into existing systems or applications.

Additionally, aspects like scalability, real-time performance, and security measures must be addressed to ensure a seamless and reliable deployment. By successfully deploying the fine-tuned model into the specific environment, you can leverage its enhanced capabilities to address real-world challenges.

挑战

需要高质量、大量、有代表性的私域的知识。
有额外的训练成本。
效果不可知：可能改进不大、可能过拟（失去推理能力），与训练数据质量、训练方法超参设置有关。在这里插入图片描述

在这里插入图片描述

微调 vs RAG

第一、知识维度

RAG 对知识的更新时间和经济成本更低。不需要训练，只需要更新数据库即可。

RAG 对知识的掌控力会更强，相比微调更不用担心学不到或者是遗忘的问题。

如果大模型强缺乏某个领域的知识，足量数据的微调才能让模型对该领域有基本的概念，如果不具备领域知识基础，RAG 仍旧无法正确回答。

第二、效果维度

RAG 相比微调能更容易获得更好的效果，突出的是稳定性、可解释性。

对任务模式比较简单的任务，微调能触碰到更高的上限，但是对训练、数据等方面的要求会更苛刻。

幻觉方面，RAG 从各种实测来看，短板基本都在检索模块，只要检索不出大问题，整体效果还是 RAG 比较有优势的。

第三、成本维度

训练方面，RAG 的成本就是更新数据库，但是微调就需要大量的显卡、时间资源。

推理方面，考虑到 RAG 本身需要检索，而且检索层为了确保检索准确，还需要很多额外工作，所以推理的耗时会比微调多，但具体多多少，就要看检索模块的复杂程度了，如果这里面还需要额外调大模型，那成本就会多很多，如果只是小模型之类的，那这个增加可以说是忽略不计。微调后的大模型直接使用，和原本模型的耗时一致。

系统拓展角度。随着项目的发展，大模型训练不一定能支撑多任务，而拿着大模型训好几个，对部署而言并不方便。

参考

https://blog.csdn.net/musicml/article/details/136576532
https://blog.csdn.net/YELLOWAUG/article/details/139029001
https://blog.csdn.net/zwqjoy/article/details/132244654 LLM 训练时GPU显存耗用量估计

https://www.turing.com/resources/finetuning-large-language-models