![[Paper Review] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality](https://icml.cc/media/PosterPDFs/ICML%202024/32613.png)
[Paper Review] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper Link Structured State Space Duality: Unifying SSMs and Attention with Mamba-2 for 2–8× Acceleration TL;DR Structured State-Space …
18 minute
Mamba
Mamba-2
Structured State Space Duality
SSD
State Space Models
SSM
Transformer
Attention Mechanism
Long Context
Efficient Training
FlashAttention
Sequence Modeling
Scaling Laws
Parallelism
GPU Acceleration
2405.21060v1
![[Paper Review] Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding](https://www.storagereview.com/wp-content/uploads/2025/07/image2-2-png-e1752234784623.webp)