High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction
14 points - last Sunday at 11:35 AM
SourceComments
vivahir215 last Sunday at 11:50 AM
Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?
jchandra last Sunday at 11:36 AM
[dead]