Archives

Browse all articles in chronological order and discover what interests you.

16 posts total

Timeline view

2025

16 posts

October 2025

4 posts

[Paper Review] DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

10-08

paper-review, LLM Serving & Systems, Model Optimization & Acceleration

11 min

[Paper Review] Marconi: Prefix Caching for the Era of Hybrid LLMs

10-08

paper-review, with-gpt, LLM Systems, Model Serving

11 min

[Paper Review] SGLang: Efficient Execution of Structured Language Model Programs

10-03

paper-review, with-gpt

19 min

[Paper Review] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

10-03

paper-review, with-gpt

18 min

July 2025

12 posts

[Paper Review] Inference-Time Hyper-Scaling with KV Cache Compression

07-29

paper-review, with-gpt

30 min

[Paper Review] Llama-Nemotron: Efficient Reasoning Models

07-29

paper-review, with-gpt, efficient-llm, system-optimization, inference-acceleration

23 min

[Paper Review] KIMI K2: OPEN AGENTIC INTELLIGENCE

07-26

paper-review, with-gpt, open-source, agentic-intelligence, RL-alignment, foundation-models

13 min

[Paper Review] Qwen 3 Technical Report

07-26

paper-review, foundation-models, with-gpt

13 min

K-Beauty: Beyond 'Accidental Success' to 'Structural Growth'

07-26

Industry Analysis, Cosmetics Industry

4 min

A Guide to Migrating from Jekyll to Hugo

07-20

daily

4 min

K-Beauty: On the Verge of a New Leap Forward?

07-20

Industry Analysis, Cosmetics Industry

6 min

Migrating My Blog from Jekyll to Hugo

07-20

daily

3 min

[Paper Review] Massive Activations in Large Language Models

07-09

paper-review, with-gpt

20 min

[Paper Review] Peri-LN: Revisiting Normalization Layer in the Transformer Architecture

07-09

paper-review, with-gpt

22 min

[paper review] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training

07-09

paper-review, with-gpt

23 min

[Paper Review] Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding

07-08

paper-review, with-gpt

17 min

2025

October 2025

July 2025

Start searching

No results found