Transformer优化加速--xformers

最新推荐文章于 2025-03-09 13:01:12 发布

贾亚飞

最新推荐文章于 2025-03-09 13:01:12 发布

阅读量1.1k

点赞数 1

分类专栏： AI 文章标签：自然语言处理

本文链接：https://blog.csdn.net/weixin_40777649/article/details/138575209

版权

一、定义

1 作用
2 优化创新点
3. 使用demo

二、实现

作用
facebook 提出， xformers能够有效加速attention计算并降低显存。
参考： https://github.com/facebookresearch/xformers
https://zhuanlan.zhihu.com/p/688745007
接口：https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention
优化创新点
实现方式采用flash attentiion，式显存降低、速度提高。
使用demo

import xformers.ops as xops
class MemEffAttention(Attention):
    def forward(self, x: Tensor, attn_bias=None) -> Tensor:
        if not XFORMERS_AVAILABLE:
            if attn_bias is not None:
                raise AssertionError("xFormers is required for using nested tensors")
            return super().forward(x)

        B, N, C = x.shape
        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads)

        q, k, v = unbind(qkv, 2)

        x