相关工作
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Deja vu Contextual sparsity for efficient llms at inference time
LLM in a flash Efficient Large Language Model Inference with Limited Memory
ReLU Strikes Back Exploiting Activation Sparsity in Large Language Models
ReLU2 Wins: Discovering Efficient Activation Functions for Sparse LLMs
ProSparse Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
ProSparse 这里展示了两种稀疏方式:向前和向后的稀疏性。
因为ws结果是稀疏的,w1也可以根据ws结果,也就是根据输出来进行稀疏。此外,w2根据输入进行稀疏。
ProSparse把激活函数替换为Relu后,采用一些特殊训练技巧,对激活采用正则化,从而获得更高的稀疏性和精度。
当然,真正要基于稀疏性获得加速,依赖于特殊的矩阵乘算子实现,该文章分别提供了输出和输入稀疏加速的矩阵乘实现。