![[Paper Review] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality](https://icml.cc/media/PosterPDFs/ICML%202024/32613.png)
[Paper Review] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper Link Structured State Space Duality: Unifying SSMs and Attention with Mamba-2 for 2–8× Acceleration TL;DR Structured State-Space …
18 minute
Mamba
Mamba-2
Structured State Space Duality
SSD
State Space Models
SSM
Transformer
Attention Mechanism
Long Context
Efficient Training
FlashAttention
Sequence Modeling
Scaling Laws
Parallelism
GPU Acceleration
2405.21060v1
![[Paper Review] Massive Activations in Large Language Models](https://eric-mingjie.github.io/massive-activations/assets/main_teaser_final.png)