安装lora+启动lora+训练一个model

最新推荐文章于 2024-07-10 18:17:20 发布

learner_ctr

最新推荐文章于 2024-07-10 18:17:20 发布

阅读量1.6k

点赞数

分类专栏： AIGC 文章标签： python 开发语言

本文链接：https://blog.csdn.net/a1066196847/article/details/131040049

版权

AIGC 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

一、安装步骤

conda create -n kohya_ss python=3.10.8
cd code
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
然后修改了setup.sh里面的xformers里面的下载地址（因为自带的那个地址，拉取需要1个小时，太慢了），换成了：https://huggingface.co/Renqf/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl/resolve/main/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl，大小都是108.4M，应该没问题
然后把这次修改add commit
最后是安装：./setup.sh -d ./kohya_ss  -v 3

二、启动lora

如果使用安装lora成功时候的日志最下面提示的./gui.sh，可能会遇到下面这个问题。这个问题还是挺难解决的，涉及到的是torch cuda cudnn的版本之间配合的问题，但是实际上在安装kohya_ss的时候

有下面这样，首先torch和cuda的版本已经对应上了，对不上的是cudnn，而为什么作者不在这里把cudnn（训练加速库）也设置上，应该是让“我们”自己去设置cudnn的版本。这里对不上也没有关系，我们只要绕过这步检测就可以了

 "linux-gnu"*) pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 \
    --extra-index-url https://download.pytorch.org/whl/cu116 >&3 &&
    pip install -U -I --no-deps \
      https://huggingface.co/Renqf/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl/resolve/main/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl >&3 ;;

作者给出来了两种启动办法，使用下面的python ./kohya_gui.py 即可

三、训练时训练问题

1、精度问题

这个是因为你的显卡是老版本，不知道bf这种精度，在训练的时候换成fp16即可：https://github.com/bmaltais/kohya_ss/issues/93

  File "/home/pai/envs/kohya_ss/lib/python3.10/site-packages/accelerate/accelerator.py", line 426, in __init__
    raise ValueError(err.format(mode="bf16", requirement="PyTorch >= 1.10 and a supported device."))
ValueError: bf16 mixed precision requires PyTorch >= 1.10 and a supported device.
Traceback (most recent call last):

2、Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop

你执行：python setup.py build develop 这个也解决不了，我翻看了一些资料，也没有找到好的解决办法

3、RuntimeError: No such operator xformers::efficient_attention_forward_cutlass - did you forget to build xformers with `python setup.py develop`?

解决办法1：不用xfromers，就不会报错了。实际测试，是否使用这个并没有明显加大/减少训练时间

解决办法2：https://github.com/bmaltais/kohya_ss/issues/784

pip install xformers==0.0.19 执行的时候看日志，把torch-cuda的版本都给升级到了2 11.7

这次训练时，勾选use xformers，没有报训练错误，但是报了下面这个错误

CrossAttention.forward has been replaced to FlashAttention (not xformers)

经过查看一些资料：https://www.reddit.com/r/StableDiffusion/comments/114e0nj/kohya_ss_error_how_do_i_solve_this/ 这是最相近的一篇文档了，里面没有提到我最终想到的解决办法，但是里面都是在说训练参数相关的，所以我把“内存优化”那个选项去掉了，就不报这个错误了，正确的运行日志是

CrossAttention.forward has been replaced to enable xformers.

------------------------------------------------

我最近在做一个项目：给指定模特穿指定衣服，对这个方向真的喜欢，并且有实力的朋友，欢迎加微信平时交流下bug和经验：ranksearch(微信号)