[Paper Review] Llama-Nemotron: Efficient Reasoning Models
Paper Link Hydragen: The Secret Weapon for Decoding Large Batches with Shared Prefixes up to 32× Faster TL;DR By decomposing the prefix and …
23 minute
2505.00949v4
Hydragen
Prefix Caching
Shared Prefix Decoding
Efficient Inference
Softmax Decomposition
LLM Serving
Attention Optimization
FlashAttention
vLLM
Batch Inference
Matrix-Matrix GEMM
TensorCore Optimization