With-Gpt

All posts under category "With-Gpt"

11 posts total
Sorted by date
[Paper Review] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

[Paper Review] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper Link Structured State Space Duality: Unifying SSMs and Attention with Mamba-2 for 2–8× Acceleration TL;DR Structured State-Space …

18 minute
Mamba Mamba-2 Structured State Space Duality SSD State Space Models SSM Transformer Attention Mechanism Long Context Efficient Training FlashAttention Sequence Modeling Scaling Laws Parallelism GPU Acceleration 2405.21060v1
[Paper Review] Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding

[Paper Review] Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding

Paper Link Helix Parallelism: Breaking the Latency-Throughput Wall of Ultra-Long LLM Decoding TL;DR Helix Parallelism schedules Attention …

17 minute
2505.09343v1 Helix Parallelism Tensor Parallelism KV Parallelism Mixture of Experts Grouped Query Attention (GQA) FlashAttention Parallelism for LLMs System-Aware ML Efficient Transformer Inference Serving LLMs at Scale Long Context Inference

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut