![[Paper Review] Marconi: Prefix Caching for the Era of Hybrid LLMs](https://pbs.twimg.com/media/GdyLXO9W4AADox0.jpg)
[Paper Review] Marconi: Prefix Caching for the Era of Hybrid LLMs
Paper Link Marconi: Rethinking Prefix Caching for the Hybrid LLM Era TL;DR Marconi introduces a prefix-caching framework for hybrid LLM …
11 minute
2411.19379v3
Marconi
Hybrid LLM
Prefix Caching
Inference Optimization
FLOP-aware Scheduling
SSM
vLLM
Serving Efficiency
![[Paper Review] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality](https://icml.cc/media/PosterPDFs/ICML%202024/32613.png)