![[Paper Review] Inference-Time Hyper-Scaling with KV Cache Compression](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2506.05345/gradient.png)
[Paper Review] Inference-Time Hyper-Scaling with KV Cache Compression
Link to Paper Dynamic Memory Sparsification (DMS): Making LLM Hyper-Scaling a Reality with 8× KV Cache Compression One-Line Summary (TL;DR) DMS, combining a …
Here are all published articles, sorted by date in descending order.
Link to Paper Dynamic Memory Sparsification (DMS): Making LLM Hyper-Scaling a Reality with 8× KV Cache Compression One-Line Summary (TL;DR) DMS, combining a …
Paper Link Hydragen: The Secret Weapon for Decoding Large Batches with Shared Prefixes up to 32× Faster TL;DR By decomposing the prefix and suffix using softmax …
Paper Link Kimi K2: An Open-Source LLM’s Leap Toward Agentic Intelligence TL;DR With a 3-stage pipeline consisting of MuonClip pretraining + large-scale agentic …
Paper Link Qwen 3: The Evolution of a Giant MoE Language Model with Adjustable Reasoning Depth TL;DR (in one line) Qwen 3 couples a user-controllable Thinking …
In a previous post, “Is K-Beauty at a Crossroads for a New Leap?,” I discussed the paradigm shift and new opportunities facing K-Beauty. We looked …
A Guide to Migrating from Jekyll to Hugo When I decided to migrate from Jekyll to Hugo, I consulted numerous documents and tutorials but still encountered quite …
Enter keywords to search articles