[Paper Review] Inference-Time Hyper-Scaling with KV Cache Compression 07-29 paper-review, with-gpt 30 min
[Paper Review] Llama-Nemotron: Efficient Reasoning Models 07-29 paper-review, with-gpt, efficient-llm, system-optimization, inference-acceleration 23 min
[Paper Review] KIMI K2: OPEN AGENTIC INTELLIGENCE 07-26 paper-review, with-gpt, open-source, agentic-intelligence, RL-alignment, foundation-models 13 min
K-Beauty: Beyond 'Accidental Success' to 'Structural Growth' 07-26 Industry Analysis, Cosmetics Industry 4 min
[Paper Review] Peri-LN: Revisiting Normalization Layer in the Transformer Architecture 07-09 paper-review, with-gpt 22 min
[paper review] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training 07-09 paper-review, with-gpt 23 min
[Paper Review] Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding 07-08 paper-review, with-gpt 17 min