尝试部署和微调llava多模态大模型,按照官网步骤按部就班。执行命令
pip install flash-attn --no-build-isolation
后报错:
Collecting flash-attn
Downloading flash_attn-2.5.6.tar.gz (2.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 11.4 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
fatal: not a git repository (or any of the parent directories): .git
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-9u5e9dng/flash-attn_e362cbbd46404df8a4978593d8bb899c/setup.py", line 114, in <module>
raise RuntimeError(
RuntimeError: FlashAttention is only supported on CUDA 11.6 and above. Note: make sure nvcc has a supported version by running nvcc -V.
torch.__version__ = 2.1.2+cu121
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
大概就是这种感觉,GitHub的issue上截的。主要都是因为cuda版本导致的,网上有很多解决办法,大部分不行。先说最后解决了我的问题的:
nvcc
该解决方案要求你安装了正确版本的cuda和pytorch,可以通过
import torch
print(torch.version.cuda)
查看。有很多相关教程,请自己查百度。
确保pytorch能正常用后,flash-attn还不能直接用pip安装,是因为没有在虚拟环境下安装nvcc,导致调用了系统自带的cuda。
所以执行以下命令即可:
conda install cuda-nvcc
如果报错了,换成
conda install cuda-nvcc -c conda-forge
就能正确安装flash-attn了。
还有一些办法,例如
去网站https://github.com/Dao-AILab/flash-attention/releases下载正确版本的whl文件,再pip install *.whl。
总之,都是cuda版本的问题,请务必注意。