2505.00949v4

All posts under tag "2505.00949v4"

1 posts total

Sorted by date

[Paper Review] Llama-Nemotron: Efficient Reasoning Models

[Paper Review] Llama-Nemotron: Efficient Reasoning Models

[Paper Review] Llama-Nemotron: Efficient Reasoning Models

Paper Link Hydragen: The Secret Weapon for Decoding Large Batches with Shared Prefixes up to 32× Faster TL;DR By decomposing the prefix and …

2505.00949v4 Hydragen Prefix Caching Shared Prefix Decoding Efficient Inference Softmax Decomposition LLM Serving Attention Optimization FlashAttention vLLM Batch Inference Matrix-Matrix GEMM TensorCore Optimization