![[Paper Review] Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding](https://www.storagereview.com/wp-content/uploads/2025/07/image2-2-png-e1752234784623.webp)
[Paper Review] Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding
Paper Link Helix Parallelism: Breaking the Latency-Throughput Wall of Ultra-Long LLM Decoding TL;DR Helix Parallelism schedules Attention …
17 minute
2505.09343v1
Helix Parallelism
Tensor Parallelism
KV Parallelism
Mixture of Experts
Grouped Query Attention (GQA)
FlashAttention
Parallelism for LLMs
System-Aware ML
Efficient Transformer Inference
Serving LLMs at Scale
Long Context Inference