一、定义
1 作用
2 优化创新点
3. 使用demo
二、实现
- 作用
facebook 提出, xformers能够有效加速attention计算并降低显存。
参考: https://github.com/facebookresearch/xformers
https://zhuanlan.zhihu.com/p/688745007
接口:https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention - 优化创新点
实现方式采用flash attentiion, 式显存降低、速度提高。 - 使用demo
import xformers.ops as xops
class MemEffAttention(Attention):
def forward(self, x: Tensor, attn_bias=None) -> Tensor:
if not XFORMERS_AVAILABLE:
if attn_bias is not None:
raise AssertionError("xFormers is required for using nested tensors")
return super().forward(x)
B, N, C = x.shape
qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads)
q, k, v = unbind(qkv, 2)
x