The Hidden Attention of Mamba Models
Ameen Ali ∗ , Itamar Zimerman ∗ , and Lior Wolf
School of Computer Science, Tel Aviv University
Abstract
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains, including NLP, long-range sequence processing, and computer vision. Selective SSMs are viewed as dual models, in which one trains in parallel on the entire sequence via an IO-aware parallel scan, and deploys in an autoregressive manner. We add a third view and show that such models can be viewed as attention-driven models. This new perspective enables us to empiri