解决：FlashAttention only supports Ampere GPUs or newer.

最新推荐文章于 2025-03-17 16:48:23 发布

曼城周杰伦

最新推荐文章于 2025-03-17 16:48:23 发布

阅读量3.5k

点赞数 3

分类专栏：那些年踩过的坑文章标签：自然语言处理人工智能深度学习神经网络 chatgpt c++

本文链接：https://blog.csdn.net/victor_manches/article/details/142235977

版权

那些年踩过的坑专栏收录该内容

21 篇文章

订阅专栏

flash attention是一个用于加速模型训练推理的可选项，且仅适用于Turing、Ampere、Ada、Hopper架构的Nvidia GPU显卡（如H100、A100、RTX X090、T4）

1.首先检查一下GPU是否支持：FlashAttention

import torch
def supports_flash_attention(device_id: int):
    """Check if a GPU supports FlashAttention."""
    major, minor = torch.cuda.get_device_capability(device_id)
    
    # Check if the GPU architecture is Ampere (SM 8.x) or newer (SM 9.0)
    is_sm8x = major == 8 and minor >= 0
    is_sm90 = major == 9 and minor == 0

    return is_sm8x or is_sm90

print(supports_flash_attention(device_id)) #-> device_id 显卡号， 0 / 1 / 2 。。。