[Paper Review] DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
Paper Link DroidSpeak: Reducing Prefill Latency by 1.7–3.1× through Cross-LLM Prefix-KV Reuse TL;DR When multiple LLMs share the same …
11 minute
2411.02820v4
droidspeak
cross-llm-kv-reuse
prefix-kv / e-cache
contiguous-layer-recompute