Mamba环境配置踩坑总结,基于Windows服务器1080ti显卡

感谢up主的教程:https://blog.csdn.net/zly_Always_be/article/details/140400011

所有的安装指令基本上都在上面up主的连接里面,本文章主要分享两个我这周以来最头疼的报错

问题1

按照上面教程将ssm和conv1d安装好之后,运行test代码,会出现报错:

File "/home/user/miniconda3/envs/textgen/lib/python3.11/site-packages/mamba_ssm/modules/mamba_simple.py", line 223, in step
x = causal_conv1d_update(
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/miniconda3/envs/textgen/lib/python3.11/site-packages/causal_conv1d/causal_conv1d_interface.py", line 83, in causal_conv1d_update
return causal_conv1d_cuda.causal_conv1d_update(x, conv_state, weight, bias, activation)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

原因在于,两个库之前的setup代码只支持算力在7以上的代码

    cc_flag.append("-gencode")
    cc_flag.append("arch=compute_70,code=sm_70")
    cc_flag.append("-gencode")
    cc_flag.append("arch=compute_80,code=sm_80")
    if bare_metal_version >= Version("11.8"):
        cc_flag.append("-gencode")
        cc_flag.append("arch=compute_90,code=sm_90")

问题1解决方案

修改mamba_ssm and causal_conv1d 的setup.py

而1080ti显卡和之前的系列显卡算力都在6.5左右,所以我们在两个库的setup.py需要加入如下支持算力为6到7之间的显卡编译代码:

    #add
    cc_flag.append("-gencode")
    cc_flag.append("arch=compute_60,code=sm_60")

    cc_flag.append("-gencode")
    cc_flag.append("arch=compute_70,code=sm_70")
    cc_flag.append("-gencode")
    cc_flag.append("arch=compute_80,code=sm_80")
    if bare_metal_version >= Version("11.8"):
        cc_flag.append("-gencode")
        cc_flag.append("arch=compute_90,code=sm_90")

问题2

setup修改好后,会出现如下的报错,大意为字符串没有应用于张量的方法contiguous

Traceback (most recent call last):
  File "E:\YuHan\骨干模型\Mamba\2023_Mamba\test.py", line 13, in <module>
    y = model(x)
  File "C:\Users\Admin\anaconda3\envs\yhmamba\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\envs\yhmamba\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\envs\yhmamba\lib\site-packages\mamba_ssm\modules\mamba_simple.py", line 146, in forward
    out = mamba_inner_fn(
  File "C:\Users\Admin\anaconda3\envs\yhmamba\lib\site-packages\mamba_ssm\ops\selective_scan_interface.py", line 307, in mamba_inner_fn
    return mamba_inner_ref(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,
  File "C:\Users\Admin\anaconda3\envs\yhmamba\lib\site-packages\mamba_ssm\ops\selective_scan_interface.py", line 323, in mamba_inner_ref
    x = causal_conv1d_fn(x, rearrange(conv1d_weight, "d 1 w -> d w"), conv1d_bias, "silu")
  File "C:\Users\Admin\anaconda3\envs\yhmamba\lib\site-packages\causal_conv1d\causal_conv1d_interface.py", line 49, in causal_conv1d_fn
    return CausalConv1dFn.apply(x, weight, bias, seq_idx, activation)
  File "C:\Users\Admin\anaconda3\envs\yhmamba\lib\site-packages\torch\autograd\function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\Users\Admin\anaconda3\envs\yhmamba\lib\site-packages\causal_conv1d\causal_conv1d_interface.py", line 19, in forward
    seq_idx = seq_idx.contiguous() if seq_idx is not None else None
AttributeError: 'str' object has no attribute 'contiguous'

问题2解决方案

追溯到问题原因的代码行,为selective_scan_interface.py里面调用mamba_inmer_ref函数中的第323行代码

    x, z = xz.chunk(2, dim=1)
    x = causal_conv1d_fn(x, rearrange(conv1d_weight, "d 1 w -> d w"), conv1d_bias, "silu")
    # We're being very careful here about the layout, to avoid extra transposes.

在这里源代码默认将causal_conv1d_fn函数的第四个传参seq_idx设置为字符串“silu”,而报错代码显示此处需要张量,因此我们将seq_idx的默认值设置为None,即可正确运行代码

    x, z = xz.chunk(2, dim=1)
    x = causal_conv1d_fn(x, rearrange(conv1d_weight, "d 1 w -> d w"), conv1d_bias, None)
    # We're being very careful here about the layout, to avoid extra transposes.
    # We want delta to have d as the slowest moving dimension

测试代码

import torch
from mamba_ssm import Mamba

batch, length, dim = 8, 64, 8
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba(
    # This module uses roughly 3 * expand * d_model^2 parameters
    d_model=dim,  # Model dimension d_model
    d_state=16,  # SSM state expansion factor
    d_conv=4,  # Local convolution width
    expand=2,  # Block expansion factor
).to("cuda")
y = model(x)
assert y.shape == x.shape
print('success')

修改完的运行结果

至此,mamba环境就已经在1080ti的windows系统上配置好了,如有问题欢迎评论区与我讨论,感谢大家支持,希望能帮助到大家。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值