【多模态大模型学习--llava部署踩坑：conda虚拟环境flash-attn安装】RuntimeError: FlashAttention is only supported on CUDA11

同屿Firmirin

已于 2024-05-29 10:56:34 修改

阅读量1.5k

点赞数 9

分类专栏：大模型debug笔记文章标签：学习 conda python pip linux

于 2024-04-25 15:39:47 首次发布

本文链接：https://blog.csdn.net/mugi_jiang/article/details/138190684

版权

大模型debug笔记专栏收录该内容

9 篇文章 0 订阅

订阅专栏

文章讲述了作者在尝试部署Llama多模态大模型时遇到的错误，主要原因是CUDA版本不兼容。解决方法包括检查并安装正确的CUDA和PyTorch版本，使用conda安装nvcc，或者从GitHub下载对应版本的.whl文件。

摘要由CSDN通过智能技术生成

尝试部署和微调llava多模态大模型，按照官网步骤按部就班。执行命令

pip install flash-attn --no-build-isolation

后报错：

Collecting flash-attn
  Downloading flash_attn-2.5.6.tar.gz (2.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 11.4 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [12 lines of output]
      fatal: not a git repository (or any of the parent directories): .git
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-9u5e9dng/flash-attn_e362cbbd46404df8a4978593d8bb899c/setup.py", line 114, in <module>
          raise RuntimeError(
      RuntimeError: FlashAttention is only supported on CUDA 11.6 and above.  Note: make sure nvcc has a supported version by running nvcc -V.
      
      
      torch.__version__  = 2.1.2+cu121

      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

大概就是这种感觉，GitHub的issue上截的。主要都是因为cuda版本导致的，网上有很多解决办法，大部分不行。先说最后解决了我的问题的：